browser-automation-agent
Automate web browsers for AI agents using agent-browser CLI with deterministic element selection.
git clone --depth 1 https://github.com/besoeasy/open-skills /tmp/browser-automation-agent && cp -r /tmp/browser-automation-agent/skills/browser-automation-agent ~/.claude/skills/browser-automation-agentSKILL.md
# Browser Automation with Agent-Browser
Agent-browser is a headless browser automation CLI designed specifically for AI agents. It provides fast browser control with deterministic element selection through accessibility tree snapshots, making it ideal for agent-driven web automation workflows.
## When to use
- Use case 1: When the user asks to automate web interactions (fill forms, click buttons, navigate sites)
- Use case 2: When you need to capture screenshots or generate PDFs of web pages
- Use case 3: For web scraping tasks that require JavaScript rendering or complex interactions
- Use case 4: When building automation workflows that need deterministic element references
- Use case 5: For testing web applications with agent-driven scenarios
## Required tools / APIs
- No external API required (runs locally)
- agent-browser: Headless browser CLI with Rust/Node.js implementation
- Chromium: Downloaded automatically during installation
Install options:
```bash
# via npm (global)
npm install -g agent-browser
agent-browser install # Downloads Chromium
# via Homebrew (macOS/Linux)
brew install agent-browser
# Verify installation
agent-browser --version
```
## Skills
### browser_open_and_snapshot
Open a URL and capture the accessibility tree to identify interactive elements.
```bash
# Open a webpage
agent-browser open https://example.com
# Get snapshot with element references
agent-browser snapshot
# The snapshot shows elements with @e1, @e2 references
# Example output:
# @e1 button "Sign In"
# @e2 input "Email" (email)
# @e3 input "Password" (password)
```
**Node.js:**
```javascript
const { execSync } = require('child_process');
function browserCommand(cmd) {
return execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' });
}
async function openAndSnapshot(url) {
browserCommand(`open ${url}`);
await new Promise(r => setTimeout(r, 2000)); // Wait for page load
const snapshot = browserCommand('snapshot');
return snapshot; // Returns element tree with references
}
// Usage
// const elements = await openAndSnapshot('https://example.com');
// console.log(elements);
```
### browser_interact
Interact with page elements using deterministic references from snapshots.
```bash
# Fill a form field
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
# Click a button
agent-browser click @e1
# Type text into active element
agent-browser type "search query" --enter
# Navigate
agent-browser back
agent-browser forward
agent-browser reload
```
**Node.js:**
```javascript
function fillForm(formData) {
for (const [ref, value] of Object.entries(formData)) {
execSync(`agent-browser fill ${ref} "${value}"`, { encoding: 'utf-8' });
}
}
function clickElement(ref) {
return execSync(`agent-browser click ${ref}`, { encoding: 'utf-8' });
}
// Usage
// fillForm({ '@e2': 'user@example.com', '@e3': 'password123' });
// clickElement('@e1');
```
### browser_capture
Capture screenshots, PDFs, or extract page content.
```bash
# Take a screenshot
agent-browser screenshot output.png
# Generate PDF
agent-browser pdf document.pdf
# Get page text content
agent-browser text
# Get HTML source
agent-browser html
# Get specific element attribute
agent-browser attribute @e5 href
```
**Node.js:**
```javascript
function captureScreenshot(filename) {
return execSync(`agent-browser screenshot ${filename}`, { encoding: 'utf-8' });
}
function generatePDF(filename) {
return execSync(`agent-browser pdf ${filename}`, { encoding: 'utf-8' });
}
function getPageText() {
return execSync('agent-browser text', { encoding: 'utf-8' });
}
function getElementAttribute(ref, attr) {
return execSync(`agent-browser attribute ${ref} ${attr}`, { encoding: 'utf-8' }).trim();
}
// Usage
// captureScreenshot('page.png');
// const text = getPageText();
// const link = getElementAttribute('@e10', 'href');
```
### browser_session_management
Manage browser sessions, tabs, and persistent state.
```bash
# Session management
agent-browser open https://example.com --session myapp
agent-browser close --session myapp
# Tab management
agent-browser open https://example.com --new-tab
agent-browser tabs list
agent-browser tabs switch 0
# Cookie and storage
agent-browser cookies get example.com
agent-browser storage set mykey "myvalue"
agent-browser storage get mykey
# Close browser
agent-browser close
```
**Node.js:**
```javascript
function openSession(url, sessionName) {
return execSync(`agent-browser open ${url} --session ${sessionName}`, { encoding: 'utf-8' });
}
function closeSession(sessionName) {
return execSync(`agent-browser close --session ${sessionName}`, { encoding: 'utf-8' });
}
function manageStorage(action, key, value = null) {
const cmd = value
? `agent-browser storage ${action} ${key} "${value}"`
: `agent-browser storage ${action} ${key}`;
return execSync(cmd, { encoding: 'utf-8' }).trim();
}
// Usage
// openSession('https://app.example.com', 'shopping-session');
// manageStorage('set', 'cart-id', '12345');
// const cartId = manageStorage('get', 'cart-id');
```
## Rate limits / Best practices
- Add delays between interactions (1-2 seconds) to allow page rendering
- Use `--wait` flag for actions that trigger navigation or async updates
- Close browser sessions when done to free system resources
- Use `--session` flags to isolate different automation workflows
- Cache snapshots when repeatedly interacting with the same page structure
- Prefer element references (@e1) over selectors for deterministic behavior
## Agent prompt
```text
You have browser automation capability through agent-browser. When a user asks to automate web interactions:
1. Open the URL with `agent-browser open <url>`
2. Get the accessibility snapshot with `agent-browser snapshot` to identify interactive elements
3. Parse the snapshot output to find element references (like @e1, @e2)
4. Use `fill`, `click`, or `type` commands with element references to interact
5. UseEncrypt and decrypt files or streams using age — a simple, modern, and secure encryption tool with small explicit keys, passphrase support, SSH key support, post-quantum hybrid keys, and UNIX-style composability. No config options, no footguns.
Upload and host files anonymously using decentralized storage with Originless and IPFS.
Star all repositories from a GitHub user automatically. Use when: (1) Supporting open source creators, (2) Bulk discovery of useful projects, or (3) Automating GitHub engagement.
Automatically creates user-facing changelogs from git commits by analyzing commit history, categorizing changes, and transforming technical commits into clear, customer-friendly release notes. Turns hours of manual changelog writing into minutes of automated generation.
Log all chat messages to a SQLite database for searchable history and audit. Use when: (1) Building chat history, (2) Auditing conversations, (3) Searching past messages, or (4) User asks to log chats.
Check cryptocurrency wallet balances across multiple blockchains using free public APIs.
Calculate line-of-sight and road distances between two cities using free OpenStreetMap services.
Research and create modern, animated tourism websites for cities with historical facts, places to visit, and colorful designs.