agent-browser
agent-browser automates interactions with web pages through command-line controls for navigation, element inspection, and DOM manipulation. Use it when automated web testing, form submission, screenshot capture, or data extraction from websites is required, particularly for tasks involving clicking elements, filling inputs, and analyzing page structure via accessibility trees.
git clone --depth 1 https://github.com/code-yeongyu/oh-my-openagent /tmp/agent-browser && cp -r /tmp/agent-browser/packages/omo-opencode/src/features/builtin-skills/agent-browser ~/.claude/skills/agent-browserSKILL.md
# Browser Automation with agent-browser ## Quick start ```bash agent-browser open <url> # Navigate to page agent-browser snapshot -i # Get interactive elements with refs agent-browser click @e1 # Click element by ref agent-browser fill @e2 "text" # Fill input by ref agent-browser close # Close browser ``` ## Core workflow 1. Navigate: `agent-browser open <url>` 2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`) 3. Interact using refs from the snapshot 4. Re-snapshot after navigation or significant DOM changes ## Commands ### Navigation ```bash agent-browser open <url> # Navigate to URL (aliases: goto, navigate) agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page agent-browser close # Close browser (aliases: quit, exit) ``` ### Snapshot (page analysis) ```bash agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.) agent-browser snapshot -c # Compact (remove empty structural elements) agent-browser snapshot -d 3 # Limit depth to 3 agent-browser snapshot -s "#main" # Scope to CSS selector agent-browser snapshot -i -c -d 5 # Combine options ``` The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links. ### Interactions (use @refs from snapshot) ```bash agent-browser click @e1 # Click (--new-tab to open in new tab) agent-browser dblclick @e1 # Double-click agent-browser focus @e1 # Focus element agent-browser fill @e2 "text" # Clear and type agent-browser type @e2 "text" # Type without clearing agent-browser keyboard type "text" # Type with real keystrokes (no selector, current focus) agent-browser keyboard inserttext "text" # Insert text without key events (no selector) agent-browser press Enter # Press key agent-browser press Control+a # Key combination agent-browser keydown Shift # Hold key down agent-browser keyup Shift # Release key agent-browser hover @e1 # Hover agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck checkbox agent-browser select @e1 "value" # Select dropdown agent-browser scroll down 500 # Scroll page (--selector <sel> for container) agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto) agent-browser drag @e1 @e2 # Drag and drop agent-browser upload @e1 file.pdf # Upload files ``` ### Get information ```bash agent-browser get text @e1 # Get element text agent-browser get html @e1 # Get innerHTML agent-browser get value @e1 # Get input value agent-browser get attr @e1 href # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count ".item" # Count matching elements agent-browser get box @e1 # Get bounding box agent-browser get styles @e1 # Get computed styles ``` ### Check state ```bash agent-browser is visible @e1 # Check if visible agent-browser is enabled @e1 # Check if enabled agent-browser is checked @e1 # Check if checked ``` ### Screenshots & PDF ```bash agent-browser screenshot # Screenshot (saves to temp dir if no path) agent-browser screenshot path.png # Save to file agent-browser screenshot --full # Full page agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf output.pdf # Save as PDF ``` Annotated screenshots overlay numbered labels `[N]` on interactive elements. Each label corresponds to ref `@eN`, so refs work for both visual and text workflows: ```bash agent-browser screenshot --annotate ./page.png # Output: [1] @e1 button "Submit", [2] @e2 link "Home", [3] @e3 textbox "Email" agent-browser click @e2 # Click the "Home" link labeled [2] ``` ### Video recording ```bash agent-browser record start ./demo.webm # Start recording (uses current URL + state) agent-browser click @e1 # Perform actions agent-browser record stop # Stop and save video agent-browser record restart ./take2.webm # Stop current + start new recording ``` Recording creates a fresh context but preserves cookies/storage from your session. ### Wait ```bash agent-browser wait @e1 # Wait for element agent-browser wait 2000 # Wait milliseconds agent-browser wait --text "Success" # Wait for text agent-browser wait --url "**/dashboard" # Wait for URL pattern agent-browser wait --load networkidle # Wait for network idle agent-browser wait --fn "window.ready" # Wait for JS condition ``` Load states: `load`, `domcontentloaded`, `networkidle` ### Mouse control ```bash agent-browser mouse move 100 200 # Move mouse agent-browser mouse down left # Press button (left/right/middle) agent-browser mouse up left # Release button agent-browser mouse wheel 100 # Scroll wheel ``` ### Semantic locators (alternative to refs) ```bash agent-browser find role button click --name "Submit" agent-browser find text "Sign In" click agent-browser find label "Email" fill "user@test.com" agent-browser find placeholder "Search..." fill "query" agent-browser find alt "Logo" click agent-browser find title "Close" click agent-browser find testid "submit-btn" click agent-browser find first ".item" click agent-browser find last ".item" click agent-browser find nth 2 "a" text ``` Actions: `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text` Options: `--name <name>` (filter role by accessible name), `--exact` (require exact text match) ### Browser settings ```bash agent-browser set viewport 1920 108
Compare HEAD with the latest published npm versions and list all unpublished changes by release layer. Triggers: unpublished changes, changelog, what changed, whats new.
Read-only GitHub triage for issues AND PRs. 1 item = 1 background task (category: quick). Analyzes all open items and writes evidence-backed reports to /tmp/{datetime}/. Every claim requires a GitHub permalink as proof. NEVER takes any action on GitHub - no comments, no merges, no closes, no labels. Reports only. Triggers: 'triage', 'triage issues', 'triage PRs', 'github triage'.
Adversarial multi-agent planning skill. Self-orchestrates 5 hostile category members (unspecified-low, unspecified-high, deep, ultrabrain, artistry) via team-mode for ruthless cross-critique debate, distills only the defensible insights, then MANDATORILY hands the distilled insight bundle to the `plan` agent for executable plan formalization. Use when planning needs maximum rigor and surfacing of weak assumptions, blind spots, and over-engineering. Triggers: 'hyperplan', 'hpp', '/hyperplan', 'adversarial plan', 'hostile planning', 'cross-critique plan', '하이퍼플랜', '적대적 계획', '교차 비평'.
Easter egg command - about oh-my-opencode. Triggers: omomomo, about, easter egg.
QA opencode itself, per case: verify the CLI/terminal (opencode run, db, serve, export), prove a specific plugin hook/action/event fired via the SSE event stream, smoke-test the TUI under tmux, and investigate sessions in opencode's SQLite DB by id, title/name, or message text. Ships tested helper scripts (each with a --self-test) plus per-domain references. Use whenever someone wants to QA, smoke-test, verify, or debug opencode's CLI, HTTP server, plugin hooks/events, or TUI, or to find/inspect opencode sessions in the database. Triggers: opencode qa, qa opencode, test opencode, verify opencode hook, opencode session db, find opencode session by id/name/text, opencode tui test, opencode server health, opencode event stream.
Nuclear-grade 16-agent pre-publish release gate. Runs /get-unpublished-changes to detect all changes since last npm release, spawns up to 10 ultrabrain agents for deep per-change analysis, invokes /review-work (5 agents) for holistic review, and 1 oracle for overall release synthesis. Use before EVERY npm publish. Triggers: 'pre-publish review', 'review before publish', 'release review', 'pre-release review', 'ready to publish?', 'can I publish?', 'pre-publish', 'safe to publish', 'publishing review', 'pre-publish check'.
Publish oh-my-opencode to npm via GitHub Actions workflow. Argument: <patch|minor|major>. Triggers: publish, release, deploy, npm publish.
Remove unused code from this project with ultrawork mode, LSP-verified safety, atomic commits. Triggers: remove dead code, dead code, cleanup, remove unused.