Skip to main content
ClaudeWave
Skill717 repo starsupdated 2mo ago

agent-browser

agent-browser is a CLI tool for automating web browser interactions including navigation, form filling, clicking, screenshots, and login sequences. Use it when users need to interact with websites programmatically, such as testing web applications, automating repetitive form submissions, or capturing page states. It is not designed for web search functionality.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/OtterMind/youclaw /tmp/agent-browser && cp -r /tmp/agent-browser/skills/agent-browser ~/.claude/skills/agent-browser
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Browser Automation with agent-browser

## Performance Rules (CRITICAL)

- **ALWAYS chain commands with `&&`** when you don't need intermediate output. Each separate tool call costs seconds of round-trip latency.
- **ALWAYS combine** open + wait + snapshot into one call: `agent-browser open <url> && agent-browser wait --load load && agent-browser snapshot -i`
- **ALWAYS batch** multiple interactions (fill, click, select) into one `&&` chain when refs are already known.
- **Use `--load load`** (DOM load event) by default. Only use `networkidle` when you specifically need all XHR/fetch to complete (e.g., waiting for API-driven content).
- **Do NOT snapshot after every interaction** — only re-snapshot when you need to discover new element refs (after navigation or major DOM changes).

## Core Workflow

Every browser automation follows this pattern:

1. **Navigate + Snapshot** (one call): `agent-browser open <url> && agent-browser wait --load load && agent-browser snapshot -i`
2. **Interact**: Batch all interactions using known refs in one `&&` chain
3. **Re-snapshot**: Only after navigation or major DOM changes

```bash
# Step 1: Open and discover elements (ONE tool call)
agent-browser open https://example.com/form && agent-browser wait --load load && agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

# Step 2: Batch all interactions (ONE tool call)
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3 && agent-browser wait --load load

# Step 3: Only snapshot if you need to verify or discover new elements
agent-browser snapshot -i
```

This reduces 7+ round-trips to just 2-3.

## Essential Commands

```bash
# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load load        # Wait for DOM load (fast, preferred)
agent-browser wait --load networkidle # Wait for network idle (slow, use only when needed)
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf          # Save as PDF

# Diff (compare page states)
agent-browser diff snapshot                          # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png  # Visual pixel diff
agent-browser diff url <url1> <url2>                 # Compare two pages
```

## Common Patterns

### Form Submission

```bash
# Step 1: Open and discover elements (ONE call)
agent-browser open https://example.com/signup && agent-browser wait --load load && agent-browser snapshot -i

# Step 2: Fill and submit (ONE call)
agent-browser fill @e1 "Jane Doe" && agent-browser fill @e2 "jane@example.com" && agent-browser select @e3 "California" && agent-browser check @e4 && agent-browser click @e5 && agent-browser wait --load load
```

### Authentication with State Persistence

```bash
# Login: open + snapshot (ONE call)
agent-browser open https://app.example.com/login && agent-browser wait --load load && agent-browser snapshot -i

# Fill credentials + submit (ONE call)
agent-browser fill @e1 "$USERNAME" && agent-browser fill @e2 "$PASSWORD" && agent-browser click @e3 && agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json && agent-browser open https://app.example.com/dashboard
```

### Data Extraction

```bash
# Open + snapshot in one call
agent-browser open https://example.com/products && agent-browser wait --load load && agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
```

## Ref Lifecycle (Important)

Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:

- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)

```bash
agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs
```

## Annotated Screenshots (Vision Mode)

Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements.

```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
#   [1] @e1 button "Submit"
#   [2] @e2 link "Home"
#   [3] @e3 textbox "Email"
agent-browser click @e2              # Click using ref from annotated screenshot
```

## Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

```bash
agent-browser find text "Sign In" click
agent-browser find label "Email