acrawl — LLM-powered web crawler. Describe what you want in plain English, get structured data back. Single Rust binary, 25 providers, MCP server built-in.
- ✓Open-source license (MIT)
- ✓Actively maintained (<30d)
- ✓Clear description
- ✓Topics declared
git clone https://github.com/Mingye-Lu/AgenticCrawler{
"mcpServers": {
"agenticcrawler": {
"command": "AgenticCrawler"
}
}
}Resumen de MCP Servers
<p align="center"> <pre align="center"> █████╗ ██████╗██████╗ █████╗ ██╗ ██╗██╗ ██╔══██╗██╔════╝██╔══██╗██╔══██╗██║ ██║██║ ███████║██║ ██████╔╝███████║██║ █╗ ██║██║ ██╔══██║██║ ██╔══██╗██╔══██║██║███╗██║██║ ██║ ██║╚██████╗██║ ██║██║ ██║╚███╔███╔╝███████╗ ╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚══╝╚══╝ ╚══════╝ </pre> </p> <p align="center"> <strong>LLM-powered web crawler.</strong> Describe what you want in plain English — get structured data back. </p> <p align="center"> <a href="https://github.com/Mingye-Lu/AgenticCrawler/actions/workflows/ci.yml"><img src="https://github.com/Mingye-Lu/AgenticCrawler/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a> <a href="https://www.rust-lang.org/"><img src="https://img.shields.io/badge/rust-2021_edition-orange.svg" alt="Rust"></a> </p> <p align="center"> Single binary. No Python runtime. 29 tools. 25 LLM providers. MCP server built-in. </p> --- ## Why acrawl? Most web scraping still means writing code: XPath selectors, pagination logic, retry handling, anti-bot workarounds. LLMs can read pages like humans do, but wiring one up to a browser is a project in itself. acrawl is that wiring, packaged as a single Rust binary. You describe a goal; the agent figures out which pages to visit, what to click, what to extract, and when it's done. - **No code required.** Describe the goal in English. The agent plans and executes. - **One binary, zero runtimes.** `cargo build --release` produces a self-contained executable. No Python, no Node runtime — just Rust and a Chromium download for browser automation. - **Smart fetching.** Static pages are served over HTTP (fast). When JavaScript or interaction is needed, acrawl detects JS framework markers (`__next_data__`, `__nuxt`, `__vue`, `ng-app`, React roots), auth redirects, and short `<noscript>` bodies — then transparently escalates to a headless browser. - **29 tools, not a chatbot.** The agent has real tools — navigate, click, fill forms, run JS, take screenshots, switch device emulation, manage tabs, run deterministic scripts — plus a fork/join layer to spawn parallel sub-agents across multiple browser tabs. - **25 LLM providers.** Anthropic, OpenAI, Google Gemini, DeepSeek, AWS Bedrock, Azure OpenAI, Vertex AI, GitHub Copilot, Groq, Mistral, xAI, Cohere, Alibaba DashScope, OpenRouter, and more. Or bring your own via any OpenAI-compatible endpoint. - **MCP client.** Extend the agent with custom tools via [Model Context Protocol](https://modelcontextprotocol.io) servers (stdio, SSE, HTTP, WebSocket). - **MCP server.** `acrawl mcp` exposes 25 browser tools plus an autonomous `run_goal` agent to any MCP-compatible client — Claude Code, Cursor, Windsurf, VS Code, Zed, JetBrains, TRAE, Gemini CLI, and more. Install with `acrawl mcp install`. ### How does it compare? #### vs. AI web agents and scraping tools | | acrawl | browser-use | Stagehand | Skyvern | Firecrawl | Playwright MCP | Scrapy | Playwright scripts | |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | No code needed | Yes | No | No | Partial | No | No | No | No | | Single binary | Yes | No | No | No | No | No | No | No | | JS rendering | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | | LLM-powered navigation | Yes | Yes | Yes | Yes | Limited | No | No | No | | No Python / Node needed | Yes | No | No | No | No | No | No | No | | Form filling / interaction | Yes | Yes | Yes | Yes | No | Yes | No | Yes | | Sub-agent parallelism | Yes | No | No | Partial | Partial | No | Partial | No | | 25 LLM providers | Yes | Via LiteLLM | Partial | Partial | N/A | N/A | N/A | N/A | | MCP client (use tools) | Yes | No | No | No | No | No | No | No | | MCP server (expose as tools) | Yes | No | No | No | Yes | Yes | No | No | | Stealth browser built-in | Yes | Cloud only | Via Browserbase | Cloud only | No | No | No | No | | Open source | Yes | Yes (MIT) | Yes (MIT) | Yes (Apache) | Engine only | Yes (MIT) | Yes (BSD) | Yes (Apache) | Notes: - **browser-use** (85k+ GitHub stars): Python + Playwright, DOM + screenshots, supports GPT/Claude/Gemini/Ollama via LiteLLM, 89.1% WebVoyager. No single binary — requires Python and `pip install`. Every action calls an LLM: 2-5s/step, ~$0.02-0.30/task. Cloud tier adds stealth; self-hosted is bare Playwright. - **Stagehand** (Browserbase, 21k+ stars): TypeScript + CDP (v3), mixes deterministic Playwright with AI primitives (`act()`, `extract()`, `observe()`). Action caching reuses successful clicks without re-calling the LLM. Requires Node and, for production, Browserbase cloud hosting. - **Skyvern** (21k+ stars, Apache 2.0): vision-first (screenshot-only, no DOM), handles legacy portals and government forms that DOM tools struggle with. No-code cloud UI available. Each step costs vision-model tokens — ~$0.10-0.50/task. 85.85% WebVoyager. - **Firecrawl** (82k+ stars): managed scraping API. Returns LLM-ready Markdown, JSON extraction, site-wide crawl. Not an agentic tool — minimal multi-step interaction. Ships an official MCP server. Per-page pricing from $19/month. - **Playwright MCP** (Microsoft, 29k+ stars): MCP server that exposes browser control via the accessibility tree. Sub-100ms actions, zero vision tokens. Drives an LLM client's browser rather than having its own reasoning — no autonomous goal navigation. Used in GitHub Copilot Agent. #### vs. native LLM provider browsing Most AI providers offer some form of browsing, but it is designed for **conversational information retrieval**, not programmatic web automation. Key constraints: | | acrawl | ChatGPT Agent | Claude Computer Use | Claude in Chrome | Gemini Deep Research | Copilot / Edge | |---|:---:|:---:|:---:|:---:|:---:|:---:| | Real JS-rendered browser | Yes | Yes (sandboxed cloud VM) | Indirect (dev provides env) | Yes (your Chrome) | No (search API only) | Limited (Bing retrieval) | | Click / fill forms | Yes | Yes (requires user confirmation) | Yes | Yes | No | Limited | | Programmable / scriptable | Yes | No | Yes (API beta) | No | No | No | | Sub-agent parallelism | Yes | No | No | No | No | No | | MCP server (expose as tools) | Yes | No | No | No | No | No | | Returns structured data | Yes | No (text summaries) | No (screenshots) | No | No | No | | Stealth / anti-bot | Yes | No | No | No | No | No | | No vendor lock-in | Yes (25 providers) | OpenAI only | Anthropic only | Anthropic only | Google only | OpenAI / Bing only | | Runs without paid subscription | Yes (OSS; LLM key needed) | No (Plus/Pro/Business) | No (API cost) | No (Max plan) | Partial | Yes (free tier) | Notes: - **ChatGPT Agent** (OpenAI, July 2025): runs in a sandboxed cloud virtual machine with its own Chromium instance. Can browse, click, and fill forms but pauses for user confirmation on sensitive actions (purchases, logins). Uses two modes: a fast text browser for research queries and a visual browser for interaction. Cannot run code in the browser, install extensions, or access your local file system. Susceptible to prompt injection. Available to Plus/Pro/Business subscribers. - **ChatGPT Atlas** (OpenAI, October 2025): a full Chromium browser with ChatGPT integrated as a sidebar + agent. Agent mode drives the same sandboxed cloud VM as ChatGPT Agent; core limitations are identical. - **Claude Computer Use** (Anthropic API, beta since October 2024): screenshot + mouse/keyboard API for any desktop application, not just browsers. Vision-only — no DOM access. Developers must provide and manage the entire computing environment (typically a Docker container with Xvfb + Firefox). Not a ready-to-use binary. Requires significant infrastructure to operate in production. - **Claude in Chrome** (Anthropic Chrome extension, beta November 2025+): lets Claude operate within your existing Chrome session using your real cookies and logins. Available to Max plan subscribers. Not an open API — no programmatic control. Good for interactive personal tasks; not suitable for batch automation. - **Gemini / Deep Research** (Google): browsing is grounded via Google Search API calls, not a live browser session. Deep Research synthesizes across many searches but cannot interact with pages (click, fill forms, navigate dynamically). Project Mariner (experimental computer use) is a separate, limited research preview. - **Copilot / Edge** (Microsoft): Edge's Copilot Mode uses Bing retrieval with some ability to navigate pages. Real-world tests show high latency (6+ minutes for multi-page comparison tasks) and frequent interruptions for user confirmation. Not a developer API. ## Quick Start ### Install **Linux / macOS (x64 / ARM64):** ```bash curl -fsSL https://raw.githubusercontent.com/Mingye-Lu/AgenticCrawler/main/install.sh | bash ``` **Windows (x64, PowerShell):** ```powershell irm https://raw.githubusercontent.com/Mingye-Lu/AgenticCrawler/main/install.ps1 | iex ``` This downloads the latest binary, verifies its SHA256 checksum, and sets up CloakBrowser for stealth browser automation. Requires Node.js 20+ for browser features. acrawl checks for updates on startup and shows a notification when a new version is available. <details> <summary>Build from source</summary> ```bash git clone https://github.com/Mingye-Lu/AgenticCrawler.git cd AgenticCrawler cargo build --release # Install CloakBrowser (required for browser automation — binary auto-downloads on first use) npm install ``` </details> ### Browser Extension (optional) The acrawl Bridge extension lets acrawl control your real browser (with your sessions, cookies, and existing extensions) instead of a headless CloakBrowser instance. Download `acrawl-extension.zip` from the [latest release](https://github.com/Mingye-Lu/AgenticCrawler/releases/latest), unzip it, then load it into your browser: | Browser | Extensions page | Developer mode toggle | |---------|----------------|----------------------| | Chrome | `c
Lo que la gente pregunta sobre AgenticCrawler
¿Qué es Mingye-Lu/AgenticCrawler?
+
Mingye-Lu/AgenticCrawler es mcp servers para el ecosistema de Claude AI. acrawl — LLM-powered web crawler. Describe what you want in plain English, get structured data back. Single Rust binary, 25 providers, MCP server built-in. Tiene 5 estrellas en GitHub y se actualizó por última vez today.
¿Cómo se instala AgenticCrawler?
+
Puedes instalar AgenticCrawler clonando el repositorio (https://github.com/Mingye-Lu/AgenticCrawler) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.
¿Es seguro usar Mingye-Lu/AgenticCrawler?
+
Nuestro agente de seguridad ha analizado Mingye-Lu/AgenticCrawler y le ha asignado un Trust Score de 87/100 (tier: Trusted). Revisa el desglose completo de comprobaciones superadas y flags en esta página.
¿Quién mantiene Mingye-Lu/AgenticCrawler?
+
Mingye-Lu/AgenticCrawler es mantenido por Mingye-Lu. La última actividad registrada en GitHub es de today, con 3 issues abiertos.
¿Hay alternativas a AgenticCrawler?
+
Sí. En ClaudeWave puedes explorar mcp servers similares en /categories/mcp, ordenados por popularidad o actividad reciente.
Despliega AgenticCrawler en tu cloud
Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.
¿Mantienes este repo? Añade un badge a tu README
Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.
[](https://claudewave.com/repo/mingye-lu-agenticcrawler)<a href="https://claudewave.com/repo/mingye-lu-agenticcrawler"><img src="https://claudewave.com/api/badge/mingye-lu-agenticcrawler" alt="Featured on ClaudeWave: Mingye-Lu/AgenticCrawler" width="320" height="64" /></a>Más MCP Servers
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
An open-source AI agent that brings the power of Gemini directly into your terminal.
The fastest path to AI-powered full stack observability, even for lean teams.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。