Skill128 estrellas del repoactualizado 2mo ago

browser-automation-agent

Browser-automation-agent is a headless browser CLI tool designed for AI agents to automate web interactions with deterministic element selection through accessibility tree snapshots. Use it when you need to automate form filling, button clicking, page navigation, screenshot capture, web scraping with JavaScript rendering, or agent-driven testing workflows that require reliable element references across multiple interactions.

Ver fuente Repositorio: open-skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/besoeasy/open-skills /tmp/browser-automation-agent && cp -r /tmp/browser-automation-agent/skills/browser-automation-agent ~/.claude/skills/browser-automation-agent

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Browser Automation with Agent-Browser

Agent-browser is a headless browser automation CLI designed specifically for AI agents. It provides fast browser control with deterministic element selection through accessibility tree snapshots, making it ideal for agent-driven web automation workflows.

## When to use

- Use case 1: When the user asks to automate web interactions (fill forms, click buttons, navigate sites)
- Use case 2: When you need to capture screenshots or generate PDFs of web pages
- Use case 3: For web scraping tasks that require JavaScript rendering or complex interactions
- Use case 4: When building automation workflows that need deterministic element references
- Use case 5: For testing web applications with agent-driven scenarios

## Required tools / APIs

- No external API required (runs locally)
- agent-browser: Headless browser CLI with Rust/Node.js implementation
- Chromium: Downloaded automatically during installation

Install options:

```bash
# via npm (global)
npm install -g agent-browser
agent-browser install  # Downloads Chromium

# via Homebrew (macOS/Linux)
brew install agent-browser

# Verify installation
agent-browser --version
```

## Skills

### browser_open_and_snapshot

Open a URL and capture the accessibility tree to identify interactive elements.

```bash
# Open a webpage
agent-browser open https://example.com

# Get snapshot with element references
agent-browser snapshot

# The snapshot shows elements with @e1, @e2 references
# Example output:
# @e1 button "Sign In"
# @e2 input "Email" (email)
# @e3 input "Password" (password)
```

**Node.js:**

```javascript
const { execSync } = require('child_process');

function browserCommand(cmd) {
  return execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' });
}

async function openAndSnapshot(url) {
  browserCommand(`open ${url}`);
  await new Promise(r => setTimeout(r, 2000)); // Wait for page load
  const snapshot = browserCommand('snapshot');
  return snapshot; // Returns element tree with references
}

// Usage
// const elements = await openAndSnapshot('https://example.com');
// console.log(elements);
```

### browser_interact

Interact with page elements using deterministic references from snapshots.

```bash
# Fill a form field
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"

# Click a button
agent-browser click @e1

# Type text into active element
agent-browser type "search query" --enter

# Navigate
agent-browser back
agent-browser forward
agent-browser reload
```

**Node.js:**

```javascript
function fillForm(formData) {
  for (const [ref, value] of Object.entries(formData)) {
    execSync(`agent-browser fill ${ref} "${value}"`, { encoding: 'utf-8' });
  }
}

function clickElement(ref) {
  return execSync(`agent-browser click ${ref}`, { encoding: 'utf-8' });
}

// Usage
// fillForm({ '@e2': 'user@example.com', '@e3': 'password123' });
// clickElement('@e1');
```

### browser_capture

Capture screenshots, PDFs, or extract page content.

```bash
# Take a screenshot
agent-browser screenshot output.png

# Generate PDF
agent-browser pdf document.pdf

# Get page text content
agent-browser text

# Get HTML source
agent-browser html

# Get specific element attribute
agent-browser attribute @e5 href
```

**Node.js:**

```javascript
function captureScreenshot(filename) {
  return execSync(`agent-browser screenshot ${filename}`, { encoding: 'utf-8' });
}

function generatePDF(filename) {
  return execSync(`agent-browser pdf ${filename}`, { encoding: 'utf-8' });
}

function getPageText() {
  return execSync('agent-browser text', { encoding: 'utf-8' });
}

function getElementAttribute(ref, attr) {
  return execSync(`agent-browser attribute ${ref} ${attr}`, { encoding: 'utf-8' }).trim();
}

// Usage
// captureScreenshot('page.png');
// const text = getPageText();
// const link = getElementAttribute('@e10', 'href');
```

### browser_session_management

Manage browser sessions, tabs, and persistent state.

```bash
# Session management
agent-browser open https://example.com --session myapp
agent-browser close --session myapp

# Tab management
agent-browser open https://example.com --new-tab
agent-browser tabs list
agent-browser tabs switch 0

# Cookie and storage
agent-browser cookies get example.com
agent-browser storage set mykey "myvalue"
agent-browser storage get mykey

# Close browser
agent-browser close
```

**Node.js:**

```javascript
function openSession(url, sessionName) {
  return execSync(`agent-browser open ${url} --session ${sessionName}`, { encoding: 'utf-8' });
}

function closeSession(sessionName) {
  return execSync(`agent-browser close --session ${sessionName}`, { encoding: 'utf-8' });
}

function manageStorage(action, key, value = null) {
  const cmd = value
    ? `agent-browser storage ${action} ${key} "${value}"`
    : `agent-browser storage ${action} ${key}`;
  return execSync(cmd, { encoding: 'utf-8' }).trim();
}

// Usage
// openSession('https://app.example.com', 'shopping-session');
// manageStorage('set', 'cart-id', '12345');
// const cartId = manageStorage('get', 'cart-id');
```

## Rate limits / Best practices

- Add delays between interactions (1-2 seconds) to allow page rendering
- Use `--wait` flag for actions that trigger navigation or async updates
- Close browser sessions when done to free system resources
- Use `--session` flags to isolate different automation workflows
- Cache snapshots when repeatedly interacting with the same page structure
- Prefer element references (@e1) over selectors for deterministic behavior

## Agent prompt

```text
You have browser automation capability through agent-browser. When a user asks to automate web interactions:

1. Open the URL with `agent-browser open <url>`
2. Get the accessibility snapshot with `agent-browser snapshot` to identify interactive elements
3. Parse the snapshot output to find element references (like @e1, @e2)
4. Use `fill`, `click`, or `type` commands with element references to interact
5. Use

Del mismo repositorio

age-file-encryptionSkill

Encrypt and decrypt files or streams using age — a simple, modern, and secure encryption tool with small explicit keys, passphrase support, SSH key support, post-quantum hybrid keys, and UNIX-style composability. No config options, no footguns.

anonymous-file-uploadSkill

Upload and host files anonymously using decentralized storage with Originless and IPFS.

bulk-github-starSkill

Star all repositories from a GitHub user automatically. Use when: (1) Supporting open source creators, (2) Bulk discovery of useful projects, or (3) Automating GitHub engagement.

changelog-generatorSkill

Automatically creates user-facing changelogs from git commits by analyzing commit history, categorizing changes, and transforming technical commits into clear, customer-friendly release notes. Turns hours of manual changelog writing into minutes of automated generation.

chat-loggerSkill

Log all chat messages to a SQLite database for searchable history and audit. Use when: (1) Building chat history, (2) Auditing conversations, (3) Searching past messages, or (4) User asks to log chats.

check-crypto-address-balanceSkill

Check cryptocurrency wallet balances across multiple blockchains using free public APIs.

city-distanceSkill

Calculate line-of-sight and road distances between two cities using free OpenStreetMap services.

city-tourism-website-builderSkill

Research and create modern, animated tourism websites for cities with historical facts, places to visit, and colorful designs.