Skip to main content
ClaudeWave
Skill407 estrellas del repoactualizado yesterday

agent-browser

agent-browser is a CLI tool for automating browser interactions via Chrome DevTools Protocol that generates token-efficient accessibility tree snapshots with compact element references (`@e1`, `@e2`, etc.) instead of parsing raw HTML. Use it when you need to navigate websites, click elements, fill forms, extract data, take screenshots, authenticate, or automate web app testing tasks with minimal token overhead.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/majiayu000/claude-skill-registry /tmp/agent-browser && cp -r /tmp/agent-browser/skills/agent/agent-browser ~/.claude/skills/agent-browser
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# agent-browser core

Fast browser automation CLI for AI agents. Chrome/Chromium via CDP, no
Playwright or Puppeteer dependency. Accessibility-tree snapshots with compact
`@eN` refs let agents interact with pages in ~200-400 tokens instead of
parsing raw HTML.

Most normal web tasks (navigate, read, click, fill, extract, screenshot) are
covered here. Load a specialized skill when the task falls outside browser
web pages — see [When to load another skill](#when-to-load-another-skill).

## The core loop

```bash
agent-browser open <url>        # 1. Open a page
agent-browser snapshot -i       # 2. See what's on it (interactive elements only)
agent-browser click @e3         # 3. Act on refs from the snapshot
agent-browser snapshot -i       # 4. Re-snapshot after any page change
```

Refs (`@e1`, `@e2`, ...) are assigned fresh on every snapshot. They become
**stale the moment the page changes** — after clicks that navigate, form
submits, dynamic re-renders, dialog opens. Always re-snapshot before your
next ref interaction.

## Quickstart

```bash
# Install once
npm i -g agent-browser && agent-browser install

# Take a screenshot of a page
agent-browser open https://example.com
agent-browser screenshot home.png
agent-browser close

# Search, click a result, and capture it
agent-browser open https://duckduckgo.com
agent-browser snapshot -i                      # find the search box ref
agent-browser fill @e1 "agent-browser cli"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i                      # refs now reflect results
agent-browser click @e5                        # click a result
agent-browser screenshot result.png
```

The browser stays running across commands so these feel like a single
session. Use `agent-browser close` (or `close --all`) when you're done.

## Reading a page

```bash
agent-browser snapshot                    # full tree (verbose)
agent-browser snapshot -i                 # interactive elements only (preferred)
agent-browser snapshot -i -u              # include href urls on links
agent-browser snapshot -i -c              # compact (no empty structural nodes)
agent-browser snapshot -i -d 3            # cap depth at 3 levels
agent-browser snapshot -s "#main"         # scope to a CSS selector
agent-browser snapshot -i --json          # machine-readable output
```

Snapshot output looks like:

```
Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"
```

For unstructured reading (no refs needed):

```bash
agent-browser get text @e1                # visible text of an element
agent-browser get html @e1                # innerHTML
agent-browser get attr @e1 href           # any attribute
agent-browser get value @e1               # input value
agent-browser get title                   # page title
agent-browser get url                     # current URL
agent-browser get count ".item"           # count matching elements
```

## Interacting

```bash
agent-browser click @e1                   # click
agent-browser click @e1 --new-tab         # open link in new tab instead of navigating
agent-browser dblclick @e1                # double-click
agent-browser hover @e1                   # hover
agent-browser focus @e1                   # focus (useful before keyboard input)
agent-browser fill @e2 "hello"            # clear then type
agent-browser type @e2 " world"           # type without clearing
agent-browser press Enter                 # press a key at current focus
agent-browser press Control+a             # key combination
agent-browser check @e3                   # check checkbox
agent-browser uncheck @e3                 # uncheck
agent-browser select @e4 "option-value"   # select dropdown option
agent-browser select @e4 "a" "b"          # select multiple
agent-browser upload @e5 file1.pdf        # upload file(s)
agent-browser scroll down 500             # scroll page (up/down/left/right)
agent-browser scrollintoview @e1          # scroll element into view
agent-browser drag @e1 @e2                # drag and drop
```

### When refs don't work or you don't want to snapshot

Use semantic locators:

```bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact     # exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
agent-browser find first ".card" click
agent-browser find nth 2 ".card" hover
```

Or a raw CSS selector:

```bash
agent-browser click "#submit"
agent-browser fill "input[name=email]" "user@test.com"
agent-browser click "button.primary"
```

Rule of thumb: snapshot + `@eN` refs are fastest and most reliable for
AI agents. `find role/text/label` is next best and doesn't require a prior
snapshot. Raw CSS is a fallback when the others fail.

## Waiting (read this)

Agents fail more often from bad waits than from bad selectors. Pick the
right wait for the situation:

```bash
agent-browser wait @e1                     # until an element appears
agent-browser wait 2000                    # dumb wait, milliseconds (last resort)
agent-browser wait --text "Success"        # until the text appears on the page
agent-browser wait --url "**/dashboard"    # until URL matches pattern (glob)
agent-browser wait --load networkidle      # until network idle (post-navigation)
agent-browser wait --load domcontentloaded # until DOMContentLoaded
agent-browser wait --fn "window.myApp.ready === true"  # until JS condition
```

After any page-changing action, pick one:

- Wait for a specific element you expect to appear: `wait @ref` or `wait --text "..."`.
- Wait for URL change: `wait --url "**/new-page"`.
- Wait for network idle (catch-all for SPA