seo-firecrawl
The seo-firecrawl skill enables Claude to crawl and scrape websites using the Firecrawl service, extracting page content, metadata, and site structure for SEO analysis. Use it to perform full-site audits, discover site structure, scrape individual pages with JavaScript rendering, search within crawled content, and analyze large-scale content inventories when SEO assessment requires comprehensive website data retrieval.
git clone --depth 1 https://github.com/Infrasity-Labs/dev-gtm-claude-skills /tmp/seo-firecrawl && cp -r /tmp/seo-firecrawl/.claude/extensions/firecrawl/skills/seo-firecrawl ~/.claude/skills/seo-firecrawlSKILL.md
# Firecrawl Extension for Claude SEO This skill requires the Firecrawl extension to be installed: ```bash ./extensions/firecrawl/install.sh ``` **Check availability:** Before using any Firecrawl tool, verify the MCP server is connected by checking if `firecrawl_scrape` or any Firecrawl tool is available. If tools are not available, inform the user the extension is not installed and provide install instructions. ## Quick Reference | Command | Purpose | |---------|---------| | `/seo firecrawl crawl <url>` | Full-site crawl with content extraction | | `/seo firecrawl map <url>` | Discover site structure (URLs only, fast) | | `/seo firecrawl scrape <url>` | Single-page scrape with JS rendering | | `/seo firecrawl search <query> <url>` | Search within a crawled site | ## Commands ### crawl -- Full-Site Crawl Crawl an entire website starting from the given URL. Returns page content, metadata, and links for all discovered pages. **MCP Tool:** `firecrawl_crawl` **Parameters:** - `url` (required): Starting URL to crawl - `limit`: Max pages to crawl (default: 100, max: 500) - `maxDepth`: Max link depth from start URL (default: 3) - `includePaths`: Array of glob patterns to include (e.g., `["/blog/*"]`) - `excludePaths`: Array of glob patterns to exclude (e.g., `["/admin/*", "/api/*"]`) - `scrapeOptions.formats`: Output formats -- `["markdown", "html", "links"]` **SEO Usage Patterns:** 1. **Comprehensive audit crawl**: Crawl full site, extract all pages for subagent analysis 2. **Section-focused crawl**: Use `includePaths` to audit only `/blog/*` or `/products/*` 3. **Broken link detection**: Crawl with `["links"]` format, check all hrefs for 404s 4. **Content inventory**: Extract all page titles, meta descriptions, H1s at scale 5. **SPA/JS-rendered sites**: Firecrawl renders JavaScript, solving the Issue #11 problem **Example orchestration for `/seo audit`:** ``` 1. firecrawl_map(url) -> get all URLs (fast, no content) 2. Filter to top 50 most important pages (homepage, key sections) 3. firecrawl_crawl(url, limit=50) -> get full content 4. Feed content to seo-technical, seo-content, seo-schema agents ``` **Cost awareness:** - Free tier: 500 credits/month - 1 credit = 1 page crawled or scraped - Map operations are cheaper (0.5 credits per URL discovered) - Always inform user of estimated credit usage before large crawls ### map -- Site Structure Discovery Discover all URLs on a website without fetching content. Fast and credit-efficient. **MCP Tool:** `firecrawl_map` **Parameters:** - `url` (required): Website URL to map - `limit`: Max URLs to discover (default: 5000) - `search`: Optional search term to filter URLs **SEO Usage Patterns:** 1. **Sitemap comparison**: Map site, compare discovered URLs vs XML sitemap 2. **Orphan page detection**: URLs in sitemap but not linked from any page 3. **Crawl budget analysis**: Total indexable pages vs pages linked from homepage 4. **URL pattern analysis**: Identify URL structure patterns, duplicates, parameter bloat 5. **Pre-audit discovery**: Run map first, then targeted crawl on key sections **Output:** Array of URLs. Present as: ``` Site: example.com Pages discovered: 342 URL Pattern Breakdown: /blog/* - 128 pages (37%) /products/* - 89 pages (26%) /category/* - 45 pages (13%) /pages/* - 32 pages (9%) / (root pages) - 48 pages (14%) ``` ### scrape -- Single-Page Deep Scrape Scrape a single page with full JavaScript rendering. More thorough than `fetch_page.py` because it executes JS and waits for dynamic content. **MCP Tool:** `firecrawl_scrape` **Parameters:** - `url` (required): Page URL to scrape - `formats`: Output formats -- `["markdown", "html", "links", "screenshot"]` - `onlyMainContent`: Strip nav/footer/sidebar (default: true) - `waitFor`: CSS selector or milliseconds to wait for content - `timeout`: Request timeout in ms (default: 30000) - `actions`: Browser actions before scraping (click, scroll, wait) **SEO Usage Patterns:** 1. **SPA content extraction**: Scrape JS-rendered React/Vue/Angular pages 2. **Dynamic content audit**: Pages with lazy-loaded content below the fold 3. **Paywall/login detection**: Identify content behind authentication walls 4. **Main content extraction**: Use `onlyMainContent` for clean E-E-A-T analysis 5. **Screenshot capture**: Use `screenshot` format for visual analysis **When to use scrape vs fetch_page.py:** | Scenario | Use | |----------|-----| | Static HTML page | `fetch_page.py` (no API cost) | | JS-rendered SPA | `firecrawl_scrape` (renders JS) | | Need response headers | `fetch_page.py` (returns headers) | | Need clean markdown | `firecrawl_scrape` (better extraction) | | Rate-limited/blocked | `firecrawl_scrape` (handles anti-bot) | ### search -- Site-Scoped Search Search within a website for specific content. Useful for finding pages related to a topic without crawling everything. **MCP Tool:** `firecrawl_search` **Parameters:** - `query` (required): Search query - `url` (required): Website to search within - `limit`: Max results (default: 10) - `scrapeOptions.formats`: Output format for matched pages **SEO Usage Patterns:** 1. **Content gap validation**: Search for a keyword on the site to check if content exists 2. **Internal linking opportunities**: Find pages mentioning a topic that could link to each other 3. **Duplicate content detection**: Search for key phrases to find near-duplicates 4. **Competitor content research**: Search competitor site for specific topics ## Cross-Skill Integration ### With seo-audit (full audit) When Firecrawl is available during `/seo audit`: 1. Use `firecrawl_map` to discover all site URLs 2. Compare with XML sitemap (seo-sitemap) to find orphan/missing pages 3. Select top pages for deep analysis 4. Feed crawled content to all subagents (technical, content, schema, geo) 5. Report total crawlable pages, URL patterns, and crawl depth ### With seo-technical - Broken link detection: crawl all internal links
>
>
>
>
>
Backlink profile analyst using free and paid sources. Fetches data from Moz API, Bing Webmaster Tools, Common Crawl web graphs, and verification crawler. Merges multi-source data with confidence-weighted scoring.
>
Content quality reviewer. Evaluates E-E-A-T signals, readability, content depth, AI citation readiness, and thin content detection.