seo-firecrawl
The seo-firecrawl skill integrates Firecrawl web scraping into Codex SEO workflows, enabling full-site crawls, single-page scrapes, site mapping, and content search. Use it for comprehensive SEO audits, section-specific analysis, broken link detection, and content inventory when the Firecrawl extension is installed and its MCP server is connected.
git clone --depth 1 https://github.com/AgriciDaniel/codex-seo /tmp/seo-firecrawl && cp -r /tmp/seo-firecrawl/skills/seo-firecrawl ~/.claude/skills/seo-firecrawlSKILL.md
# Firecrawl Extension for Codex SEO
## Shared Data Cache
**Step 0 -- Check shared data cache:**
Before gathering, check `.seo-cache/` for reusable context from related SEO skills.
Reference: `../seo/references/shared-data-cache.md` for schemas and dependency map.
Check these cache files when present:
- `.seo-cache/site-meta.json` for domain, business type, industry, and crawl context
- `.seo-cache/audit-scores.json` for prior full-audit priorities
- `.seo-cache/pages/{url-slug}/page-analysis.json` for page-level context when a URL is provided
- If found: parse and use clearly valid fields (note "Using cached [X] from [date]")
- If missing, corrupt, or irrelevant: continue with fresh evidence
- If the user says "refresh" or "re-run": ignore cache reads and overwrite on write
This skill requires the Firecrawl extension to be installed:
```bash
./extensions/firecrawl/install.sh
```
**Check availability:** Before using any Firecrawl tool, verify the MCP server
is connected by checking if `firecrawl_scrape` or any Firecrawl tool
is available. If tools are not available, inform the user the extension is not
installed and provide install instructions.
## Quick Reference
| Command | Purpose |
|---------|---------|
| `/seo firecrawl crawl <url>` | Full-site crawl with content extraction |
| `/seo firecrawl map <url>` | Discover site structure (URLs only, fast) |
| `/seo firecrawl scrape <url>` | Single-page scrape with JS rendering |
| `/seo firecrawl search <query> <url>` | Search within a crawled site |
## Commands
### crawl -- Full-Site Crawl
Crawl an entire website starting from the given URL. Returns page content,
metadata, and links for all discovered pages.
**MCP Tool:** `firecrawl_crawl`
**Parameters:**
- `url` (required): Starting URL to crawl
- `limit`: Max pages to crawl (default: 100, max: 500)
- `maxDepth`: Max link depth from start URL (default: 3)
- `includePaths`: Array of glob patterns to include (e.g., `["/blog/*"]`)
- `excludePaths`: Array of glob patterns to exclude (e.g., `["/admin/*", "/api/*"]`)
- `scrapeOptions.formats`: Output formats -- `["markdown", "html", "links"]`
**SEO Usage Patterns:**
1. **Comprehensive audit crawl**: Crawl full site, extract all pages for subagent analysis
2. **Section-focused crawl**: Use `includePaths` to audit only `/blog/*` or `/products/*`
3. **Broken link detection**: Crawl with `["links"]` format, check all hrefs for 404s
4. **Content inventory**: Extract all page titles, meta descriptions, H1s at scale
5. **SPA/JS-rendered sites**: Firecrawl renders JavaScript, solving the Issue #11 problem
**Example orchestration for `/seo audit`:**
```
1. firecrawl_map(url) -> get all URLs (fast, no content)
2. Filter to top 50 most important pages (homepage, key sections)
3. firecrawl_crawl(url, limit=50) -> get full content
4. Feed content to seo-technical, seo-content, seo-schema agents
```
**Cost awareness:**
- Free tier: 500 credits/month
- 1 credit = 1 page crawled or scraped
- Map operations are cheaper (0.5 credits per URL discovered)
- Always inform user of estimated credit usage before large crawls
### map -- Site Structure Discovery
Discover all URLs on a website without fetching content. Fast and credit-efficient.
**MCP Tool:** `firecrawl_map`
**Parameters:**
- `url` (required): Website URL to map
- `limit`: Max URLs to discover (default: 5000)
- `search`: Optional search term to filter URLs
**SEO Usage Patterns:**
1. **Sitemap comparison**: Map site, compare discovered URLs vs XML sitemap
2. **Orphan page detection**: URLs in sitemap but not linked from any page
3. **Crawl budget analysis**: Total indexable pages vs pages linked from homepage
4. **URL pattern analysis**: Identify URL structure patterns, duplicates, parameter bloat
5. **Pre-audit discovery**: Run map first, then targeted crawl on key sections
**Output:** Array of URLs. Present as:
```
Site: example.com
Pages discovered: 342
URL Pattern Breakdown:
/blog/* - 128 pages (37%)
/products/* - 89 pages (26%)
/category/* - 45 pages (13%)
/pages/* - 32 pages (9%)
/ (root pages) - 48 pages (14%)
```
### scrape -- Single-Page Deep Scrape
Scrape a single page with full JavaScript rendering. More thorough than
`fetch_page.py` because it executes JS and waits for dynamic content.
**MCP Tool:** `firecrawl_scrape`
**Parameters:**
- `url` (required): Page URL to scrape
- `formats`: Output formats -- `["markdown", "html", "links", "screenshot"]`
- `onlyMainContent`: Strip nav/footer/sidebar (default: true)
- `waitFor`: CSS selector or milliseconds to wait for content
- `timeout`: Request timeout in ms (default: 30000)
- `actions`: Browser actions before scraping (click, scroll, wait)
**SEO Usage Patterns:**
1. **SPA content extraction**: Scrape JS-rendered React/Vue/Angular pages
2. **Dynamic content audit**: Pages with lazy-loaded content below the fold
3. **Paywall/login detection**: Identify content behind authentication walls
4. **Main content extraction**: Use `onlyMainContent` for clean E-E-A-T analysis
5. **Screenshot capture**: Use `screenshot` format for visual analysis
**When to use scrape vs fetch_page.py:**
| Scenario | Use |
|----------|-----|
| Static HTML page | `fetch_page.py` (no API cost) |
| JS-rendered SPA | `firecrawl_scrape` (renders JS) |
| Need response headers | `fetch_page.py` (returns headers) |
| Need clean markdown | `firecrawl_scrape` (better extraction) |
| Rate-limited/blocked | `firecrawl_scrape` (handles anti-bot) |
### search -- Site-Scoped Search
Search within a website for specific content. Useful for finding pages
related to a topic without crawling everything.
**MCP Tool:** `firecrawl_search`
**Parameters:**
- `query` (required): Search query
- `url` (required): Website to search within
- `limit`: Max results (default: 10)
- `scrapeOptions.formats`: Output format for matched pages
**SEO Usage Patterns:**
1. **Content gap validation**: Search for a keyword on the site to check if conAI image generation for SEO assets: OG/social preview images, blog hero images, schema images, product photography, infographics. Powered by Gemini via nanobanana-mcp. Requires banana extension installed. Use when user says \"generate image\", \"OG image\", \"social preview\", \"hero image\", \"blog image\", \"product photo\", \"infographic\", \"seo image\", \"create visual\", \"image-gen\", \"favicon\", \"schema image\", \"pinterest pin\", \"generate visual\", \"banner\", or \"thumbnail\".
>
Full website SEO audit with parallel subagent delegation. Crawls up to 500 pages, detects business type, delegates to up to 15 specialists (8 always + 7 conditional), generates health score. Use when user says audit, full SEO check, SEO best-practice review, analyze my site, website health check, or find SEO issues.
Backlink profile analysis: referring domains, anchor text distribution, toxic link detection, competitor gap analysis. Works with free APIs (Moz, Bing Webmaster, Common Crawl) and DataForSEO extension. Use when user says backlinks, link profile, referring domains, anchor text, toxic links, link gap, link building, disavow, or backlink audit.
>
>
>
>