Skip to main content
ClaudeWave
Skill65 repo starsupdated 27d ago

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/avibebuilder/claude-prime /tmp/agent-browser && cp -r /tmp/agent-browser/.claude/skills/agent-browser ~/.claude/skills/agent-browser
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Browser Automation with agent-browser

Use this skill to drive websites through the `agent-browser` CLI. Keep the main loop tight: inspect the page, act with refs, verify the result, and only pull deeper docs when the task actually needs them.

## Core workflow

Prefer `agent-browser` directly for speed. Use `npx agent-browser` only if it is not installed globally.

For most tasks, follow this loop:

1. Open the page
2. Wait for the relevant state
3. Snapshot with refs
4. Interact using those refs
5. Re-snapshot after page or DOM changes
6. Verify the outcome
7. Close the session when done

```bash
agent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i
```

Chain commands with `&&` only when you do not need to inspect intermediate output. Good: `open && wait && screenshot`. Bad: `snapshot && click` when you still need to read the refs from the snapshot.

## Route fast

Read only the reference that matches the task:

| Need | Read |
| --- | --- |
| Full command or flag lookup | [references/commands.md](references/commands.md) |
| Ref lifecycle, stale refs, snapshot strategy | [references/snapshot-refs.md](references/snapshot-refs.md) |
| Login flows, OAuth, 2FA, saved auth state | [references/authentication.md](references/authentication.md) |
| Parallel sessions, state reuse, cleanup | [references/session-management.md](references/session-management.md) |
| Recording, profiling, local files, config, iOS, security | [references/advanced-usage.md](references/advanced-usage.md) |
| Proxy setup | [references/proxy-support.md](references/proxy-support.md) |
| Recording workflows | [references/video-recording.md](references/video-recording.md) |
| Profiling workflows | [references/profiling.md](references/profiling.md) |

## Golden path commands

Use these first; go to the command reference only when you need something more specific.

```bash
agent-browser open <url>
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser select @e3 "option"
agent-browser get url
agent-browser get text @e1
agent-browser diff snapshot
agent-browser screenshot --annotate
agent-browser close
```

## Refs are the default interaction model

The main value of `agent-browser` is that snapshots produce compact refs like `@e1`, `@e2`, `@e3`. Those refs are cheaper and more reliable than repeatedly reasoning from raw HTML or long selectors.

Treat refs as short-lived. Re-snapshot after anything that can change the page state, especially:
- navigation
- form submission
- opening dropdowns or modals
- lazy-loaded or client-rendered content

If a ref fails or the page looks different from what you expected, your next move is usually `agent-browser snapshot -i`, not another blind click.

For the full lifecycle and troubleshooting rules, read [references/snapshot-refs.md](references/snapshot-refs.md).

## Choose the lightest tool that still proves the result

Default order:
1. `snapshot -i` for structure and interactive targets
2. `get text`, `get url`, or `get title` for precise verification
3. `diff snapshot` when you need to confirm something changed
4. `screenshot --annotate` when layout, icon-only controls, canvas, or visual context matters
5. semantic locators or `eval` only when refs are unavailable or the task truly needs them

If you need semantic locators, JavaScript evaluation, local file access, annotated screenshots, or config details, jump to [references/advanced-usage.md](references/advanced-usage.md) and [references/commands.md](references/commands.md).

## Authentication: decide sensitivity first

Before filling any credential, classify the auth flow.

- **Non-sensitive**: localhost, staging, test accounts, or credentials the user explicitly provided for this task. The agent can usually fill these directly.
- **Sensitive**: production domains, real user accounts, OAuth/SSO, or anything where the agent should not handle the secret. In that case, reach the auth step, switch to a headed browser if needed, and let the user complete sign-in manually.

After either path succeeds, offer to save reusable state if it would help next time. Do not auto-save credentials or session state without asking.

Use [references/authentication.md](references/authentication.md) for the exact decision rules and storage patterns.

## Sessions: isolate work on purpose

Use named sessions when you are:
- running parallel browser tasks
- comparing two sites or variants
- preserving auth state for reuse
- avoiding interference across agents

When multiple agents may browse concurrently, use a named session from the start and close it explicitly when finished. Prefer semantic names over generic ones.

For session reuse and cleanup patterns, read [references/session-management.md](references/session-management.md).

## Security defaults for AI-driven browsing

If the page is untrusted or may contain hostile content, enable content boundaries before inspecting rich output. If the task is scoped to a known target, consider an allowlist for trusted domains.

The main point is to keep page content clearly separated from tool output and to narrow where the browser is allowed to go when the task permits it.

See [references/advanced-usage.md](references/advanced-usage.md) for content boundaries, domain allowlists, action policy, and output limits.

## Failure modes to catch early

- **Stale refs**: you interacted after the page changed without re-snapshotting
- **Missing waits**: you captured or clicked before async content settled
- **Visual-only UI**: text snapshots missed icon buttons, canvas, or spatial layout
- **Shell quoting in `eval`**: use the safer pattern
askSkill

Answer questions about code, architecture, and technical decisions — no implementation. Trigger on questions asking 'why', 'what does this do', 'what is the purpose of', 'explain', 'what's the difference', 'compare', or 'what are the tradeoffs' — even when referencing specific files, code snippets, or inline code. The key signal is the user wants to UNDERSTAND something, not change it. Do NOT trigger for requests to build, fix, plan, review, research, or add/modify code.

cookSkill

Implement, build, create, or add any feature, endpoint, page, component, or functionality. Use this skill whenever the user asks you to write new code or make code changes — whether it's adding an API endpoint, building a UI page, creating an export feature, wiring up a webhook, implementing a search/filter, or any other hands-on coding task. This is the default skill for all 'build this', 'add this', 'create this', 'wire up', 'implement' requests. Covers the full cycle: clarify requirements, plan if needed, write code, verify, and review. Do NOT use for pure research, debugging, documentation, or explanation — only when the user wants working code delivered.

create-docSkill

Use when the user wants to save knowledge as a file so others don't have to rediscover it — \"turn this into a doc\", \"write this up\", \"document how X works\", \"we figured this out and want to capture it\", \"nobody should have to figure this out again\". Covers any request to create or update durable written artifacts: onboarding guides, runbooks, ADRs, API docs, architecture notes, postmortems, changelogs, setup guides. The trigger: user wants knowledge captured in a file for future reference, not just a conversation. Do NOT use when still making decisions (→ give-plan), just asking for explanation without a file (→ ask), or writing code (→ cook).

diagnoseSkill

Investigate unexpected behavior and mysterious bugs. Use when the cause of a problem is unknown and the user needs to understand WHY something is happening — symptoms like: sudden unexplained changes in metrics or behavior, works locally but not in staging/production, inconsistent or intermittent failures, correct code producing wrong results, operations succeeding but having no effect, environment-specific failures, duplicate executions, stale data, or any \"why did this change?\" or \"why is this happening?\" situation. Covers infrastructure anomalies (cache hit rates dropping, latency spikes, queue behavior shifts) as well as code bugs. The key signal is confusion about root cause, not a request to implement a known fix. Do NOT use for feature requests, known fixes, planning, or documentation tasks.

discussSkill

Brainstorms and debates approaches, then drives toward an actionable decision. Use whenever someone needs a thinking partner for a decision they're facing: 'discuss', 'debate', 'brainstorm', 'weigh options', 'tradeoffs', 'should I do X or Y', 'help me decide', 'I'm torn between', 'sanity check my thinking', or 'what do you think about'. The user must be asking for help reasoning through a choice — not asking to build, fix, evaluate, plan, or modify something (even if the topic involves this skill itself). Picks the right decision lens, surfaces tradeoffs and blind spots, pushes back when reasoning is genuinely weak, and never implements.

docs-seekerSkill

Fetch up-to-date documentation for any library, framework, API, or service into context. Use when the user wants to look up API references, check function signatures or required fields, find feature-specific docs, or verify how an external tool actually works. Triggers for queries about third-party libraries like Stripe, SQLAlchemy, Tailwind, FastAPI, shadcn, Drizzle, Hono, Better Auth — any time the answer lives in official docs rather than in the project codebase. Use this instead of guessing from trained knowledge, which is stale.

fixSkill

Fix bugs and broken behavior when there is enough evidence to act on a repair path. Use for errors, crashes, incorrect results, API failures (500, 404, 403), CORS problems, database exceptions, broken rendering, duplicated or wrong data, off-by-one mistakes, timezone/date bugs, broken forms, config-caused runtime failures, and regressions. Trigger when the user wants the bug repaired and the conversation already contains a clear failing area, a reproducible failing test, a concrete error path, or a prior diagnosis to implement. Do NOT use for new features, pure explanation, architecture discussion, broad research, or bug reports where the main need is figuring out why the behavior happens — use diagnose for that.

frontend-designSkill

Builds distinctive, production-grade UIs that avoid generic AI aesthetics. Use whenever the user wants to build, restyle, or give visual direction to any interface — pages, dashboards, landing pages, components, onboarding flows, mobile screens, or design systems — even without an explicit 'design' request. Also triggers for: picking an aesthetic direction, improving the look of a dull/generic existing page, adding visual personality, or choosing colors/typography. Includes a bundled design intelligence database for concrete guidance across web (React, Next.js, Vue, Tailwind) and mobile (React Native, Flutter, SwiftUI).