Skill134 repo starsupdated yesterday

fetching-blocked-urls

This skill retrieves readable markdown content from URLs that standard web fetching cannot access, using Jina AI's reader service to bypass blocks, render JavaScript, and handle paywalls. Use it as a fallback when web_fetch returns errors like 403 Forbidden, timeout, or JavaScript-rendering failures, employing automatic retry logic to maximize success rates before requesting manual user intervention.

View source Repository: claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/oaustegard/claude-skills /tmp/fetching-blocked-urls && cp -r /tmp/fetching-blocked-urls/fetching-blocked-urls ~/.claude/skills/fetching-blocked-urls

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Fetching Blocked URLs

Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown.

## Activation Triggers

Invoke this skill immediately when web_fetch returns:
- 403 Forbidden or access denied
- Paywall or login wall indicators
- Empty, garbled, or truncated content
- JavaScript-heavy SPA failures
- Timeout errors

## Core Command

```bash
curl -s --max-time 30 "https://r.jina.ai/TARGET_URL"
```

The service returns markdown with page title, body text, and preserved links.

## Retry Pattern

Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success:

```bash
for attempt in 1 2 3; do
  result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1)
  echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; }
  [ $attempt -lt 3 ] && sleep 1
done
```

## Workflow Integration

1. **Primary**: Use web_fetch (native tool)
2. **Fallback**: This skill with retry when web_fetch fails
3. **Escalate**: Request user assistance only after retry exhaustion

Attempt this fallback before asking users to copy-paste content manually.

## Output Format

Jina returns structured markdown:
- `Title:` page title
- `URL Source:` original URL
- `Markdown Content:` extracted body text, links preserved

## Limitations

- Long pages may truncate
- Sites blocking all scrapers remain inaccessible
- Login-required content limited to public portions
- Real-time dynamic content may not render

## Domain Access

`r.jina.ai` is whitelisted in Claude container network configuration.