agent-browser
agent-browser is a command-line tool for automating browser interactions programmatically. Use it to navigate websites, take screenshots, fill forms, click buttons, extract data, and perform web scraping or testing tasks. The tool works by opening a persistent browser session, taking interactive snapshots to identify elements with references like @e1, then interacting with those elements through commands like fill, click, and select.
git clone --depth 1 https://github.com/openteams-lab/openteams /tmp/agent-browser && cp -r /tmp/agent-browser/assets/skills/agent-browser ~/.claude/skills/agent-browserSKILL.md
# Browser Automation with agent-browser ## Core Workflow Every browser automation follows this pattern: 1. **Navigate**: `agent-browser open <url>` 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`) 3. **Interact**: Use refs to click, fill, select 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ``` ## Command Chaining Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls. ```bash # Chain open + wait + snapshot in one call agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i # Chain multiple interactions agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3 # Navigate and capture agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png ``` **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs). ## Essential Commands ```bash # Navigation agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser # Snapshot agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) agent-browser click @e1 # Click element agent-browser click @e1 --new-tab # Click and open in new tab agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser keyboard type "text" # Type at current focus (no selector) agent-browser keyboard inserttext "text" # Insert without key events agent-browser scroll down 500 # Scroll page agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container # Get information agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds # Downloads agent-browser download @e1 ./file.pdf # Click element to trigger download agent-browser wait --download ./output.zip # Wait for any download to complete agent-browser --download-path ./downloads open <url> # Set default download directory # Capture agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf output.pdf # Save as PDF # Diff (compare page states) agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved file agent-browser diff screenshot --baseline before.png # Visual pixel diff agent-browser diff url <url1> <url2> # Compare two pages agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy agent-browser diff url <url1> <url2> --selector "#main" # Scope to element ``` ## Common Patterns ### Form Submission ```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ``` ### Authentication with Auth Vault (Recommended) ```bash # Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY) # Recommended: pipe password via stdin to avoid shell history exposure echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin # Login using saved profile (LLM never sees password) agent-browser auth login github # List/show/delete profiles agent-browser auth list agent-browser auth show github agent-browser auth delete github ``` ### Authentication with State Persistence ```bash # Login once and save state agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json # Reuse in future sessions agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ``` ### Session Persistence ```bash # Auto-save/restore cookies and localStorage across browser restarts agent-browser --session-name myapp open https://app.example.com/login # ... login flow ... agent-browser close # State auto-saved to ~/.agent-browser/sessions/ # Next time, state is auto-loaded agent-browser --session-name myapp open https://app.example.com/dashboard # Encrypt state at rest
Automate Ably tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Abstract tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Abuselpdb tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Abyssale tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Accelo tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Accredible Certificates tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate Acculynx tasks via Rube MCP (Composio). Always search tools first for current schemas.
Automate ActiveCampaign tasks via Rube MCP (Composio). Always search tools first for current schemas.