Skill124 repo starsupdated today
servo-fetch
Fetch and render web pages using the Servo browser engine — a single binary with JS execution, CSS layout, screenshots, and content extraction. Use when a URL returns empty or incomplete content with plain HTTP fetch, when you need a screenshot without GPU, or when you need to run JavaScript in a page context. No browser download required.
Install in Claude Code
Copygit clone --depth 1 https://github.com/konippi/servo-fetch /tmp/servo-fetch && cp -r /tmp/servo-fetch/skills/servo-fetch ~/.claude/skills/servo-fetchThen start a new Claude Code session; the skill loads automatically.
Definition
SKILL.md
# servo-fetch
## When to use
- A URL returns empty or incomplete content with simple HTTP fetch (SPA, React, Vue)
- You need a screenshot of a web page in CI/Docker (no GPU available)
- You need to evaluate JavaScript in a page context (DOM queries, data extraction)
- You want clean Markdown from a documentation site, blog, or article
- You need to crawl an entire documentation site or blog for RAG / knowledge ingestion
- You need the accessibility tree with bounding boxes for a page
## When NOT to use
- The page is simple static HTML (use `curl` or built-in web fetch instead)
- You need to interact with the page (click, fill forms) — servo-fetch is read-only
- You need full Chromium compatibility for complex web apps
## Tools (MCP)
Start the MCP server: `servo-fetch mcp` (stdio) or `servo-fetch mcp --port 8080` (Streamable HTTP)
### fetch
Extract readable content from a URL. JavaScript is executed, CSS layout is computed, and navigation noise (navbars, sidebars, footers, cookie banners) is stripped automatically.
Parameters:
- `url` (required): URL to fetch (http/https only)
- `format`: `"markdown"` (default), `"json"`, `"html"`, `"text"`, or `"accessibility_tree"`
- `selector`: CSS selector to extract a specific section instead of full-page extraction
- `max_length`: max characters to return (default 5000)
- `start_index`: character offset for pagination
- `timeout`: page load timeout in seconds (default 30)
- `settle_ms`: extra wait in ms after load event for SPAs (default 0, max 10000)
```text
fetch(url: "https://docs.rs/tokio", format: "markdown")
fetch(url: "https://example.com", format: "json", selector: "article")
fetch(url: "https://example.com", format: "accessibility_tree")
```
PDF URLs are auto-detected via Content-Type and extracted directly.
### batch_fetch
Fetch multiple URLs in parallel. Results are returned as separate content entries in completion order. Failed URLs are reported inline without aborting the batch.
Parameters:
- `urls` (required): array of URLs to fetch (http/https only, max 20)
- `format`: `"markdown"` (default) or `"json"`
- `selector`: CSS selector to extract a specific section
- `max_length`: max characters per URL result (default 5000)
- `timeout`: page load timeout in seconds per URL (default 30)
- `settle_ms`: extra wait in ms after load event (default 0, max 10000)
```text
batch_fetch(urls: ["https://a.com", "https://b.com"], format: "markdown")
batch_fetch(urls: ["https://a.com", "https://b.com"], format: "json", selector: "article")
```
### crawl
Crawl a website starting from a URL, following same-site links via BFS. JavaScript is executed, CSS layout is computed, and navigation noise is stripped. Respects robots.txt.
Parameters:
- `url` (required): starting URL to crawl (http/https only)
- `limit`: max pages to crawl (default 20, max 500)
- `max_depth`: max link depth from seed (default 3, max 10)
- `format`: `"markdown"` (default) or `"json"`
- `include_glob`: URL path patterns to include (e.g. `["/docs/**"]`)
- `exclude_glob`: URL path patterns to exclude
- `max_length`: max characters per page result (default 5000)
- `timeout`: page load timeout in seconds per page (default 30)
- `settle_ms`: extra wait in ms after load event (default 0, max 10000)
- `selector`: CSS selector to extract a specific section per page
```text
crawl(url: "https://docs.example.com", limit: 20, max_depth: 3)
crawl(url: "https://docs.example.com", include_glob: ["/guide/**"], limit: 50)
```
### screenshot
Capture a PNG screenshot. Uses Servo's software renderer — works without GPU.
Parameters:
- `url` (required): URL to capture
- `full_page`: capture the full scrollable page (default false)
- `timeout`: page load timeout in seconds (default 30)
- `settle_ms`: extra wait in ms after load event (default 0, max 10000)
```text
screenshot(url: "https://example.com")
screenshot(url: "https://example.com", full_page: true)
```
### execute_js
Evaluate a JavaScript expression after the page loads. Console messages (log, warn, error) are appended to the result.
Parameters:
- `url` (required): URL to load
- `expression` (required): JavaScript expression to evaluate
- `timeout`: page load timeout in seconds (default 30)
- `settle_ms`: extra wait in ms after load event (default 0, max 10000)
```text
execute_js(url: "https://example.com", expression: "document.title")
execute_js(url: "https://example.com", expression: "[...document.querySelectorAll('h2')].map(e => e.textContent)")
```
## CLI
```bash
servo-fetch https://example.com # Markdown (default)
servo-fetch https://example.com --format json # Structured JSON
servo-fetch URL1 URL2 URL3 # Parallel batch (Markdown with separators)
servo-fetch URL1 URL2 --format json # Parallel batch (NDJSON)
servo-fetch https://example.com --format png -o out.png # Save PNG screenshot
servo-fetch https://example.com --js "document.title" # Run JavaScript and print result
servo-fetch https://example.com --selector article # Extract a section by CSS selector
servo-fetch https://example.com --schema schema.json # Schema-driven JSON
servo-fetch https://example.com --cookies cookies.txt # Send session cookies
servo-fetch https://example.com --format html # Raw HTML
servo-fetch https://example.com --format text # Plain text
servo-fetch https://example.com -t 60 # Custom timeout
servo-fetch https://example.com --settle 500 # Extra wait for SPAs
servo-fetch crawl https://docs.example.com --limit 20 # Crawl a site (BFS)
servo-fetch crawl https://docs.example.com --include "/docs/**" # Crawl with path filter
servo-fetch URL --output page.md # Save a single URL to a file
servo-fetch crawl URL --output-dir ./pages/