agent-browser
agent-browser is a CLI tool for automating browser interactions via Chrome DevTools Protocol that generates token-efficient accessibility tree snapshots with compact element references (`@e1`, `@e2`, etc.) instead of parsing raw HTML. Use it when you need to navigate websites, click elements, fill forms, extract data, take screenshots, authenticate, or automate web app testing tasks with minimal token overhead.
git clone --depth 1 https://github.com/majiayu000/claude-skill-registry /tmp/agent-browser && cp -r /tmp/agent-browser/skills/agent/agent-browser ~/.claude/skills/agent-browserSKILL.md
# agent-browser core Fast browser automation CLI for AI agents. Chrome/Chromium via CDP, no Playwright or Puppeteer dependency. Accessibility-tree snapshots with compact `@eN` refs let agents interact with pages in ~200-400 tokens instead of parsing raw HTML. Most normal web tasks (navigate, read, click, fill, extract, screenshot) are covered here. Load a specialized skill when the task falls outside browser web pages — see [When to load another skill](#when-to-load-another-skill). ## The core loop ```bash agent-browser open <url> # 1. Open a page agent-browser snapshot -i # 2. See what's on it (interactive elements only) agent-browser click @e3 # 3. Act on refs from the snapshot agent-browser snapshot -i # 4. Re-snapshot after any page change ``` Refs (`@e1`, `@e2`, ...) are assigned fresh on every snapshot. They become **stale the moment the page changes** — after clicks that navigate, form submits, dynamic re-renders, dialog opens. Always re-snapshot before your next ref interaction. ## Quickstart ```bash # Install once npm i -g agent-browser && agent-browser install # Take a screenshot of a page agent-browser open https://example.com agent-browser screenshot home.png agent-browser close # Search, click a result, and capture it agent-browser open https://duckduckgo.com agent-browser snapshot -i # find the search box ref agent-browser fill @e1 "agent-browser cli" agent-browser press Enter agent-browser wait --load networkidle agent-browser snapshot -i # refs now reflect results agent-browser click @e5 # click a result agent-browser screenshot result.png ``` The browser stays running across commands so these feel like a single session. Use `agent-browser close` (or `close --all`) when you're done. ## Reading a page ```bash agent-browser snapshot # full tree (verbose) agent-browser snapshot -i # interactive elements only (preferred) agent-browser snapshot -i -u # include href urls on links agent-browser snapshot -i -c # compact (no empty structural nodes) agent-browser snapshot -i -d 3 # cap depth at 3 levels agent-browser snapshot -s "#main" # scope to a CSS selector agent-browser snapshot -i --json # machine-readable output ``` Snapshot output looks like: ``` Page: Example - Log in URL: https://example.com/login @e1 [heading] "Log in" @e2 [form] @e3 [input type="email"] placeholder="Email" @e4 [input type="password"] placeholder="Password" @e5 [button type="submit"] "Continue" @e6 [link] "Forgot password?" ``` For unstructured reading (no refs needed): ```bash agent-browser get text @e1 # visible text of an element agent-browser get html @e1 # innerHTML agent-browser get attr @e1 href # any attribute agent-browser get value @e1 # input value agent-browser get title # page title agent-browser get url # current URL agent-browser get count ".item" # count matching elements ``` ## Interacting ```bash agent-browser click @e1 # click agent-browser click @e1 --new-tab # open link in new tab instead of navigating agent-browser dblclick @e1 # double-click agent-browser hover @e1 # hover agent-browser focus @e1 # focus (useful before keyboard input) agent-browser fill @e2 "hello" # clear then type agent-browser type @e2 " world" # type without clearing agent-browser press Enter # press a key at current focus agent-browser press Control+a # key combination agent-browser check @e3 # check checkbox agent-browser uncheck @e3 # uncheck agent-browser select @e4 "option-value" # select dropdown option agent-browser select @e4 "a" "b" # select multiple agent-browser upload @e5 file1.pdf # upload file(s) agent-browser scroll down 500 # scroll page (up/down/left/right) agent-browser scrollintoview @e1 # scroll element into view agent-browser drag @e1 @e2 # drag and drop ``` ### When refs don't work or you don't want to snapshot Use semantic locators: ```bash agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find text "Sign In" click --exact # exact match only agent-browser find label "Email" fill "user@test.com" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click agent-browser find first ".card" click agent-browser find nth 2 ".card" hover ``` Or a raw CSS selector: ```bash agent-browser click "#submit" agent-browser fill "input[name=email]" "user@test.com" agent-browser click "button.primary" ``` Rule of thumb: snapshot + `@eN` refs are fastest and most reliable for AI agents. `find role/text/label` is next best and doesn't require a prior snapshot. Raw CSS is a fallback when the others fail. ## Waiting (read this) Agents fail more often from bad waits than from bad selectors. Pick the right wait for the situation: ```bash agent-browser wait @e1 # until an element appears agent-browser wait 2000 # dumb wait, milliseconds (last resort) agent-browser wait --text "Success" # until the text appears on the page agent-browser wait --url "**/dashboard" # until URL matches pattern (glob) agent-browser wait --load networkidle # until network idle (post-navigation) agent-browser wait --load domcontentloaded # until DOMContentLoaded agent-browser wait --fn "window.myApp.ready === true" # until JS condition ``` After any page-changing action, pick one: - Wait for a specific element you expect to appear: `wait @ref` or `wait --text "..."`. - Wait for URL change: `wait --url "**/new-page"`. - Wait for network idle (catch-all for SPA
Use when you need to install the embedded robot agents into either .cursor/agents or .claude/agents, selecting the destination interactively and copying the embedded agent definitions from project assets. This should trigger for requests such as Install embedded agents; Bootstrap .cursor/agents; Bootstrap .claude/agents; Copy robot agents. Part of cursor-rules-java project
Use when you need to generate an AGENTS.md file for a Java repository — covering project conventions, tech stack, file structure, commands, Git workflow, and contributor boundaries — through a modular, step-based interactive process that adapts to your specific project needs. This should trigger for requests such as Create AGENTS.md; Update AGENTS.md file; Add agent instructions. Part of cursor-rules-java project
>
Generated skill from request: trinity auto-boot validator
Create your OpenAI Agents SDK skill in one prompt, then learn to improve it throughout the chapter
Create your OpenAI Agents SDK skill in one prompt, then learn to improve it throughout the chapter
Create your Google Agent Development Kit skill in one prompt, then learn to improve it throughout the chapter