Skip to main content
ClaudeWave
Skill235 estrellas del repoactualizado 3d ago

gallery-scraper

Gallery Scraper downloads images in bulk from login-protected gallery websites using an attached Chrome browser session. Use this skill when users request to download, extract, or batch-save images from authenticated gallery pages, convert thumbnails to full-size versions, or collect images across multiple gallery pages without manual downloading.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jdrhyne/agent-skills /tmp/gallery-scraper && cp -r /tmp/gallery-scraper/clawdbot/gallery-scraper ~/.claude/skills/gallery-scraper
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Gallery Scraper

Bulk download images from authenticated gallery websites via browser relay.

## Safety Boundaries

- Do not access gallery sites or user accounts that the user has not explicitly attached and authorized.
- Do not download beyond the selected gallery, profile, or page range without confirmation.
- Do not store cookies, tokens, or hidden form values in local output files.
- Do not keep retrying blocked downloads indefinitely; surface rate limits or auth failures instead.

## Prerequisites

- User must have Chrome with OpenClaw Browser Relay extension
- User must be logged into the target site
- User must attach the browser tab (click relay toolbar button, badge ON)

## Workflow

### 1. Attach Browser Tab

Ask user to:
1. Log into the gallery site in Chrome
2. Navigate to the target gallery/profile page
3. Click the OpenClaw Browser Relay toolbar button (badge shows ON)

### 2. Discover Image URL Pattern

Most gallery sites store full-size URLs in data attributes. Common patterns:

```javascript
// Extract via browser evaluate
() => {
  // Try common patterns
  const patterns = [
    'img[data-max]',           // data-max attribute
    'img[data-src]',           // lazy-load pattern
    'img[data-full]',          // full-size pattern
    'a[data-lightbox] img',    // lightbox galleries
    '.gallery-item img'        // generic gallery
  ];
  
  for (const sel of patterns) {
    const imgs = document.querySelectorAll(sel);
    if (imgs.length > 0) {
      return {
        selector: sel,
        count: imgs.length,
        sample: imgs[0].outerHTML.substring(0, 200)
      };
    }
  }
  return null;
}
```

### 3. Extract Full-Size URLs

Once pattern identified, extract all URLs:

```javascript
// For data-max pattern (common)
() => Array.from(document.querySelectorAll('img[data-max]'))
  .map(img => img.dataset.max)

// For thumbnail→full conversion (replace path segment)
() => Array.from(document.querySelectorAll('.gallery img'))
  .map(img => img.src.replace('/thumb/', '/full/'))
```

### 4. Handle Pagination

Check for multiple pages:

```javascript
() => {
  const pagination = document.querySelectorAll('.pagination a, [class*="page"] a');
  return Array.from(pagination).map(a => ({text: a.textContent, href: a.href}));
}
```

Navigate to each page and collect URLs.

### 4b. Batch scrape multiple galleries (iframe trick)

When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract `data-max` URLs:

```javascript
async () => {
  const urls = [
    'https://site.example/galleries/view/123',
    'https://site.example/galleries/view/456'
  ];
  const results = [];
  for (const url of urls) {
    const iframe = document.createElement('iframe');
    iframe.style.position = 'fixed';
    iframe.style.left = '-9999px';
    iframe.style.width = '800px';
    iframe.style.height = '600px';
    iframe.src = url;
    document.body.appendChild(iframe);
    await new Promise((resolve, reject) => {
      const t = setTimeout(() => reject(new Error('timeout load')), 20000);
      iframe.onload = () => { clearTimeout(t); resolve(); };
    });
    const doc = iframe.contentDocument;
    const start = Date.now();
    let imgs = [];
    while (Date.now() - start < 20000) {
      imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max);
      if (imgs.length) break;
      await new Promise(r => setTimeout(r, 500));
    }
    results.push({ id: url.split('/').pop(), urls: imgs });
    iframe.remove();
  }
  return results;
}
```

### 5. Check CDN Access

Test if CDN requires authentication or just Referer:

```bash
# Test direct access
curl -I "CDN_URL" 2>/dev/null | head -3

# Test with Referer
curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3
```

### 6. Bulk Download

Collect the URLs into a text file, then parallel download:

```bash
# Create output directory
mkdir -p ~/Downloads/gallery_name

# Download with Referer header (parallel)
cd ~/Downloads/gallery_name
while IFS= read -r url; do
  filename=$(basename "$url")
  curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" &
  [ $(jobs -r | wc -l) -ge 8 ] && wait -n
done < urls.txt
wait
```

**Python ThreadPool fallback (avoids shell quoting + wait -n issues):**

```python
import os
import requests
from concurrent.futures import ThreadPoolExecutor

outdir = os.path.expanduser('~/Downloads/gallery_name')
os.makedirs(outdir, exist_ok=True)
headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'}

with open('urls.txt') as f:
    urls = [line.strip() for line in f if line.strip()]

def download(url):
    filename = os.path.join(outdir, os.path.basename(url))
    if os.path.exists(filename) and os.path.getsize(filename) > 0:
        return
    r = requests.get(url, headers=headers, timeout=60)
    r.raise_for_status()
    with open(filename, 'wb') as f:
        f.write(r.content)

with ThreadPoolExecutor(max_workers=8) as ex:
    for url in urls:
        ex.submit(download, url)
```

## Handling Lock Buttons

Some galleries have "lock" buttons to reveal hidden content. Look for:

```javascript
// Find lock/unlock buttons
() => {
  const locks = document.querySelectorAll(
    '[class*="lock"], [class*="unlock"], ' +
    'button[title*="lock"], .premium-unlock'
  );
  return Array.from(locks).map(el => ({
    tag: el.tagName,
    class: el.className,
    text: el.innerText?.substring(0, 30)
  }));
}
```

Click each lock button before extracting URLs.

## Output Organization

Optionally organize by gallery:

```bash
# Derive a gallery-specific folder name from the selected URL
mkdir -p "gallery_<id>"
```

## Troubleshooting

- **403 Forbidden**: Add Referer header or extract cookies from browser
- **Rate limited**: Reduce parallel downloads, add delays
- **Missing images**: Check for JavaScript-loaded content, may need scroll injection
- **Login required for CDN**: Extract session cookies via `document.cookie`
auto-updaterSkill

Automatically update OpenClaw and selected skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.

clawdbot-release-checkSkill

Check for new OpenClaw releases and notify once per new version.

clawddocsSkill

OpenClaw documentation expert with decision tree navigation, search scripts, doc fetching, version tracking, and config snippets for all OpenClaw features

knowledge-graphSkill

Three-Layer Memory System — automatic fact extraction, entity-based knowledge graph, and weekly synthesis. Manages life/areas/ entities with atomic facts and living summaries.

self-improving-agentSkill

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.

skill-syncSkill

Sync skills between local installation and the GitHub source-of-truth repository. Use when asked to install, update, list, or push skills.

todo-trackerSkill

Persistent TODO scratch pad for tracking tasks across sessions. Use when user says "add to TODO", "what's on the TODO", "mark X done", "show TODO list", "remove from TODO", or asks about pending tasks. Also triggers on heartbeat to remind about stale items.

codexSkill

Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Uses GPT-5.2 by default for state-of-the-art software engineering.