Skip to main content
ClaudeWave
Skill990 repo starsupdated today

ha-browser

The ha-browser skill automates Chrome web interactions through an eight-action operating loop (status, tabs, snapshot, act) that maintains synchronization with browser state and DOM changes. Use it when handling tasks requiring web automation: form filling, login flows, page scraping, element clicking, or content requiring JavaScript rendering and cookies. The skill includes automatic stale-reference recovery and guidance for handling authentication, CAPTCHA, and dialog blocks.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/shiwenwen/hope-agent /tmp/ha-browser && cp -r /tmp/ha-browser/skills/ha-browser ~/.claude/skills/ha-browser
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Hope Agent Browser — operating loop

The `browser` tool exposes 8 high-level actions over a single Chrome session. Backend is direct CDP via `chromiumoxide` — no Node.js required.

## The standard loop

Run these in order; never skip a step. Browsers are stateful — assumptions get punished.

```
1. browser(action="status")            # never operate blindly
2. browser(action="tabs",  op="list")  # know what's open before opening more
3. browser(action="tabs",  op="new", url=...)  # only if you actually need a fresh tab
4. browser(action="snapshot", format="role")    # get fresh refs
5. browser(action="act", kind=..., ref=..., ...)
6. when in doubt → re-snapshot
```

A typical "fill the login form" flow is:

```
status → tabs.list → (already on the right tab? if not, tabs.select / tabs.new)
       → navigate.go url="https://app.example.com/login"
       → snapshot format=role           # capture refs
       → act kind=fill   ref=<email>   text="me@..."
       → act kind=fill   ref=<password> text="..."
       → act kind=click  ref=<submit>
       → snapshot format=role           # re-snapshot after navigation
       → verify expected element exists
```

## Refs are tied to the snapshot

`ref` is **only** valid against the most recent `snapshot.role` for the active tab. The moment the page navigates, the DOM mutates (SPA route change, modal opens/closes), or you switch tab — refs are stale.

**Re-snapshot when:**

- you just called `navigate.go / .back / .forward / .reload`
- you switched tab (`tabs.select`)
- an `act` returned an error that looks like "ref not found" / "no such element" / "detached"
- the URL bar in the snapshot output differs from what you expect
- a previous `act` triggered an obvious UI change (modal, page transition, form expansion)

## Stale-ref auto-recovery — what to expect

The tool tries **one** automatic recovery before bubbling up a stale-ref error: it re-snapshots, looks for an element with the same `role` and `text` (or a substring match) as the original ref, and retries with the new ref. On success the result string ends with `(ref auto-recovered)` so you know it kicked in. On failure you get the original error.

Practical rules:

- If a recovery happened, **verify the next action's prerequisites freshly** — a recovered ref means the DOM rearranged.
- If recovery fails, **resnapshot manually and re-plan** — don't keep hammering the same `act` call.
- Recovery only kicks in for `act.kind`. `navigate`, `tabs.*`, and `control.*` do not retry.

## When to stop and ask the user

These five situations are **blocking** — do not guess your way through them. Call `ask_user_question` and wait.

| Signal in the snapshot or error | What you must do |
| --- | --- |
| Login form / email + password / "Sign in" button | Ask the user to sign in, or supply credentials via the right channel. Never type credentials you guessed. |
| 2FA / OTP code prompt | Ask the user for the code (they have the device). |
| CAPTCHA / "I'm not a robot" | Ask the user to solve it. Do not attempt to bypass. |
| Camera / microphone / notification permission prompt | Ask the user — that's a system dialog only they can answer. |
| Browser-native file picker or download confirmation | If you triggered it, `control.handle_dialog`. If it's a host-driven save dialog, ask the user. |

When you have to stop, the right call is roughly:

```
ask_user_question({
  reason: "Browser flow requires you",
  questions: [{ q: "I see a CAPTCHA on https://...; please solve it, then say 'continue'.", ... }]
})
```

## Tab discipline

Multi-tab work loses refs more than anything else. Two rules keep you sane:

1. **Name your tabs as soon as you open them.** Right after `tabs.new`, jot the `target_id` and the URL/role in your reasoning ("tab A11C = github, tab B22D = jira"). Always pass `target_id` explicitly to `tabs.select` instead of relying on "active".
2. **One snapshot per action burst per tab.** Don't snapshot tab A, switch to tab B for two ops, switch back to A, and reuse the old A refs. Re-snapshot when you come back.

## When NOT to use `browser`

Browser automation is the most expensive tool you have — every step is round-trips, every snapshot is a 30KB blob, every screenshot is a hundred KBs. Don't reach for it when something cheaper works:

- **Public web content** → `web_fetch` (HTML/Markdown) or `web_search` first. Open a browser only if the page is JS-rendered, gated by cookies, or you genuinely need to interact (click, fill).
- **One-shot file download** → `web_fetch` saves bytes; `browser.snapshot pdf` is only for the rendered DOM-as-PDF case.
- **API call** → if the site has an API and you can hit it with `exec curl` or `web_fetch`, do that — no browser needed.

## Common pitfalls

- **Two snapshots in a row without anything in between**: you wasted a turn. Snapshot once, then `act` a few times, then re-snapshot.
- **`act.kind=fill` with a ref that points to a `<div>` instead of `<input>`**: the snapshot output annotates `role`; check it before filling. Use `evaluate` to inspect the DOM if unsure.
- **Trusting the URL on a redirect-heavy site**: after `navigate.go`, the page may bounce through several URLs. Re-snapshot after navigation, do not assume the URL in your `navigate.go` argument matches the current page.
- **Forgetting `observe`**: when something silently fails, `observe.kind=console` and `observe.kind=page_errors` often surface the cause for free.

## Common CDP error strings

- `"Cannot find context with specified id"` / `"detached"` / `"no such element"` — stale ref or page replaced. Recovery: re-snapshot.
- `"selected page closed"` — active tab was closed externally. Recovery: `tabs.list` → `tabs.select` on a live tab.

## Choosing `profile.op=launch profile=`

```
profile=managed       → automation, scrapers, anything that should NOT inherit
                        the user's login state. Ephemeral runner under
                        ~/.hope-agent/browser/managed-runner/, OS-picked
code-reviewSkill

>

email-draftSkill

Use when the user asks to draft, polish, translate, or reply to an email. Produces a clean draft with subject line, greeting, body, and sign-off, plus a pre-send self-check.

feishuSkill

Use when the user mentions 飞书 / Feishu / Lark workspace operations: docx (云文档) read/write, bitable (多维表格) records / views / dashboards, drive (云盘) upload/download, wiki (知识库) link resolution, approval (审批) instance create/cancel/query, calendar (日历) event create/list/update + attendees, contact (联系人) user/department lookup, hire (招聘) job/talent/application listing. Trigger on phrases like 'OKR 周报', '把这份文档发到飞书云盘', '给团队拉个评审会议', '查 [姓名] 的联系方式', '撤销那条审批', '/wiki 链接', or any request that mentions a feishu / lark URL / token (doxcn.../bascn.../wikcn.../boxcn.../om_...).

ha-find-skillsSkill

Discover and install third-party skills from external registries when the user needs a capability that no currently-active skill covers. Trigger when: (1) the user explicitly asks 'find a skill for X', 'is there a skill that does X', 'install a skill to X', (2) the user requests a well-known integration (Slack, Notion, Trello, GitHub, Hue, Sonos, iMessage, weather, TTS, transcription …) that isn't in the active skill catalog, (3) you are about to hand-write ad-hoc shell / API code for a domain that almost certainly has a published skill. Do NOT trigger if an active skill already covers the need — scan the visible skill catalog first.

ha-logsSkill

Self-service diagnostics — query Hope Agent's local SQLite databases (logs / sessions / async jobs) directly via the `exec` tool to investigate problems, analyze usage, and locate root causes. Trigger on: user reports something broken / failing / slow / stuck / not responding ('X 不工作', 'X 报错', 'X 卡住', '为什么 X 失败', 'why did X fail', 'show me the logs', 'check what happened'); ad-hoc data analysis ('this week's token usage', '最近调用最多的工具', 'how many subagent runs failed', 'tool error rate', 'find sessions where X happened'); verifying a fix ('did the error stop after I changed Y'). Use BEFORE asking the user to paste log snippets — the data is on disk, query it directly. Read-only — SELECT only, never UPDATE/DELETE/INSERT/DROP.

ha-mac-controlSkill

Hope Agent native macOS desktop control — the standard `mac_control` status / diagnostics / apps / dock / spaces / snapshot / visual / windows / menu / clipboard / dialog loop, target-first action rules, no-blind-coordinate policy, and recovery for stale AX/window/menu/dialog state. Load whenever using `mac_control`, or when the user asks to control local Mac apps, Dock, Spaces, click/type/menu/window/dialog/clipboard, automate Finder/TextEdit/System Settings, visually locate UI, or says 控制 Mac, macOS 自动化, 点按钮, 打开应用, Dock, Space, 关闭窗口, 菜单点击, 视觉定位.

ha-self-diagnosisSkill

Self-understanding and issue reporting for Hope Agent itself. Use when the user asks how Hope Agent works internally, asks about its own source code/docs/runtime behavior, reports a bug/failure/slowness/crash, asks to diagnose logs, or asks to create/submit a GitHub issue for a bug, feature request, or improvement (including when there is no bug). Chinese triggers: 自查, 了解自己, 自我诊断, 排查 Hope Agent, 提交 issue, 需求 issue, 功能改进.

ha-self-updateSkill

Check for and install Hope Agent updates through conversation. Use whenever the user asks about upgrades, new versions, release notes, or reports a bug that might already be fixed upstream — phrases like 'upgrade Hope Agent', 'update hope agent', 'check for new version', '升级一下', '有新版本吗', '帮我升级', 'is there a newer build', 'check release notes', 'install the latest'. Also use proactively when an `app_update(action=\"check\")` result shows `has_update: true` and the user hasn't been told yet. Covers all three formfactors: desktop GUI bundle (DMG/MSI/AppImage), `hope-agent server` daemon installed via Homebrew/Scoop/AUR/apt/dnf, and headless single-binary deployments. The upgrade is always user-confirmed via `ask_user_question` — never silent.