Skip to main content
ClaudeWave
Skill867 estrellas del repoactualizado yesterday

agent-browser

agent-browser is a command-line tool for automating browser interactions within AI agent workflows. Use it to navigate websites, fill forms, click buttons, take screenshots, extract data, or test web applications programmatically. Installation is required on first use, followed by a one-time Chromium download. The tool supports commands for opening URLs, capturing page snapshots, clicking elements, and retrieving page content.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/hAcKlyc/MyAgents /tmp/agent-browser && cp -r /tmp/agent-browser/bundled-skills/agent-browser ~/.claude/skills/agent-browser
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Browser Automation with agent-browser

## First-time Setup

The CLI is **not pre-installed** with the app — install it on first use, then download Chromium. **Always run this self-check before issuing any agent-browser command in a fresh environment.**

### Step 1: Verify or install the CLI

```bash
# Probe by RUNNING the CLI (not just checking PATH presence). This catches the
# common case of a stale wrapper from a previous app version: it lives at
# ~/.myagents/bin/agent-browser, satisfies `command -v`, but execs a deleted
# bundle path → fails with "cannot find file". `--version` exercises the real
# code path and triggers the install fallback when broken.
agent-browser --version >/dev/null 2>&1 || npm_config_prefix="${MYAGENTS_NPM_GLOBAL_PREFIX:-$HOME/.myagents/npm-global}" npm install -g agent-browser@0.15.1
```

The install lands in `~/.myagents/npm-global/bin/agent-browser`, which sits earlier in PATH than any legacy wrapper. MyAgents exposes `MYAGENTS_NPM_GLOBAL_PREFIX` for this command-local install instead of setting `npm_config_prefix` on the whole shell, so nvm-based user shells stay quiet. Subsequent `agent-browser …` calls find the new binary automatically.

Tell the user **once** that you're installing the browser tool (~few seconds the first time, instant afterward), then proceed.

#### If `npm install -g` fails

If the install fails (network blocked, registry unreachable, EACCES on a locked-down system Node), invoke the CLI via `npx` **inline on every command** — Bash aliases do not persist across separate tool calls in this environment, so each command must carry the prefix:

```bash
# Use the npx prefix on EVERY command. Do not try to alias — it won't survive.
npx -y agent-browser@0.15.1 open https://example.com
npx -y agent-browser@0.15.1 snapshot -i
npx -y agent-browser@0.15.1 click @e1
```

This is slower (~1s overhead per call) but works without an install step.

### Step 2: Download Chromium (~160MB, one-time)

```bash
agent-browser install
# OR if you're on the npx fallback:
# npx -y agent-browser@0.15.1 install
```

Inform the user this download may take a minute on slow connections.

### Troubleshooting

| Symptom | Fix |
|---------|-----|
| `agent-browser` runs but shows "cannot find file" | Stale wrapper from a previous app version is shadowing the new install. The new install at `~/.myagents/npm-global/bin/` should win on PATH; if it doesn't, run `which agent-browser` to see which path resolves first, then either remove the stale path or invoke the new binary by its absolute path. |
| `npm install -g` exits with the registry blocked / network error | Use the `npx` inline fallback above. If `npx` also fails, the user's network is blocking the npm registry — ask them about proxy / VPN. |
| `agent-browser install` fails to download Chromium | Network issue / GFW. User may need a proxy or VPN. Ask the user. |
| `Executable doesn't exist` mid-task | Chromium got deleted or the install never finished. Re-run `agent-browser install`. |

## Core Workflow

Every browser automation follows this pattern:

1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs

```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result
```

## Command Chaining

Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

# Navigate and capture
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
```

**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

## Essential Commands

```bash
# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements (divs with onclick, cursor:pointer)
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser click @e1 --new-tab     # Click and open in new tab
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser keyboard type "text"    # Type at current focus (no selector)
agent-browser keyboard inserttext "text"  # Insert without key events
agent-browser scroll down 500         # Scroll page
agent-browser scroll down 500 --selector "div.content"  # Scroll within a specific container

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent
supportSkill

>-

docxSkill

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

download-anythingSkill

>

myagents-cliSkill

>-

pdfSkill

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

pptxSkill

Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks

skill-creatorSkill

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

task-alignmentSkill

Alignment conversation starting from a 想法/idea. Co-decides with the user whether the idea should be acted on directly in the current session, or fixed into a formal Task for independent dispatch (one-off or recurring). Handles lightweight 'do it now while we talk', heavyweight 'define precisely, run later or on a schedule', and 'just help me think about this' — all on the same skill. Use when the user arrives via the 想法 panel's 'AI 讨论' button (parameter dictionary in the first message), or says 'let's think this through', 'help me plan this', 'I want to explore X', 'I have an idea', '/task-alignment'. Also use proactively when a user jumps into a complex task without defining scope or success criteria — pause, align, and help them pick the right vessel (this session vs. a task).