agent-browser
Agent-browser is a fast CLI tool for automating browser interactions using Chrome/Chromium via Chrome DevOps Protocol. It provides accessibility-tree snapshots and compact element references for reliable interaction with websites, web applications, and Electron desktop apps. Use it when tasks require programmatic web navigation, form filling, button clicking, screenshot capture, data extraction, web app testing, or automation of browser-based workflows that need to run at scale.
git clone --depth 1 https://github.com/mxyhi/ok-skills /tmp/agent-browser && cp -r /tmp/agent-browser/agent-browser ~/.claude/skills/agent-browserSKILL.md
# agent-browser Fast browser automation CLI for AI agents. Chrome/Chromium via CDP with accessibility-tree snapshots and compact `@eN` element refs. Install: `npm i -g agent-browser && agent-browser install` ## Start here This file is a discovery stub, not the usage guide. Before running any `agent-browser` command, load the actual workflow content from the CLI: ```bash agent-browser skills get core # start here — workflows, common patterns, troubleshooting agent-browser skills get core --full # include full command reference and templates ``` The CLI serves skill content that always matches the installed version, so instructions never go stale. The content in this stub cannot change between releases, which is why it just points at `skills get core`. ## Specialized skills Load a specialized skill when the task falls outside browser web pages: ```bash agent-browser skills get electron # Electron desktop apps (VS Code, Slack, Discord, Figma, ...) agent-browser skills get slack # Slack workspace automation agent-browser skills get dogfood # Exploratory testing / QA / bug hunts agent-browser skills get vercel-sandbox # agent-browser inside Vercel Sandbox microVMs agent-browser skills get agentcore # AWS Bedrock AgentCore cloud browsers ``` Run `agent-browser skills list` to see everything available on the installed version. ## Why agent-browser - Fast native Rust CLI, not a Node.js wrapper - Works with any AI agent (Cursor, Claude Code, Codex, Continue, Windsurf, etc.) - Chrome/Chromium via CDP with no Playwright or Puppeteer dependency - Accessibility-tree snapshots with element refs for reliable interaction - Sessions, authentication vault, state persistence, video recording - Specialized skills for Electron apps, Slack, exploratory testing, cloud providers ## Observability Dashboard The dashboard runs independently of browser sessions on port 4848 and can also be opened through a proxied or forwarded URL such as `https://dashboard.agent-browser.localhost`. Agents should stay on the dashboard origin: session tabs, status, and stream traffic are proxied internally, so session ports do not need to be exposed.
Build AI chat interfaces using ai-elements components — conversations, messages, tool displays, prompt inputs, and more. Use when the user wants to build a chatbot, AI assistant UI, or any AI-powered chat interface.
Autonomous iteration loop: modify, verify, keep/discard against any metric
Use when working with icons in any project. Provides CLI for searching 200+ icon libraries (Iconify) and retrieving SVGs. Commands: `better-icons search <query>` to find icons, `better-icons get <id>` to get SVG. Also available as MCP server for AI agents.
Capture a full DevTools-protocol trace of any browser automation — CDP firehose, screenshots, and DOM dumps — then bisect the stream into per-page searchable buckets. Use when the user wants to debug a failed run, audit network/console/DOM activity, attach a trace to an in-progress session, or feed structured per-page summaries back into an agent loop so its next iteration learns from the last one.
>
Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.
Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "automate Slack app", "control VS Code", "interact with Discord app", "test this Electron app", "connect to desktop app", or any task requiring automation of a native Electron application.