Skill1.3k repo starsupdated today

ha-browser

The ha-browser skill automates Chrome web interactions through an eight-action operating loop (status, tabs, snapshot, act) that maintains synchronization with browser state and DOM changes. Use it when handling tasks requiring web automation: form filling, login flows, page scraping, element clicking, or content requiring JavaScript rendering and cookies. The skill includes automatic stale-reference recovery and guidance for handling authentication, CAPTCHA, and dialog blocks.

View source Repository: hope-agent

Install in Claude Code

Copy

git clone --depth 1 https://github.com/shiwenwen/hope-agent /tmp/ha-browser && cp -r /tmp/ha-browser/skills/ha-browser ~/.claude/skills/ha-browser

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Hope Agent Browser — operating loop

The `browser` tool exposes 8 high-level actions. Default backend is Hope Agent's Chrome Extension + Native Messaging Host, which can control the user's real Chrome tabs after they install the extension and native host. If the extension is unavailable, generic browsing can fall back to the managed/user_attach CDP backend, but real Chrome tab/session tasks must fail closed and ask the user to install or enable the extension.

## The standard loop

Run these in order; never skip a step. Browsers are stateful — assumptions get punished.

```
1. browser(action="status")            # never operate blindly
2. browser(action="tabs",  op="list")  # know what's open before opening more
3. browser(action="tabs",  op="new", url=...)  # only if you actually need a fresh tab
4. browser(action="snapshot", format="role")    # get fresh refs
5. browser(action="act", kind=..., ref=..., ...)
6. when in doubt → re-snapshot
```

When the user explicitly asks for their current Chrome, an already-open tab, their logged-in session, or browser extensions/cookies from their daily Chrome, use `tabs.open_user_tabs` and `tabs.claim` first. Do not launch a managed CDP profile and pretend it is the user's Chrome.

Real Chrome access uses the normal Hope Agent tool approval flow. `tabs.open_user_tabs`, `tabs.claim`, extension numeric-id `tabs.select`, `observe.kind=downloads`, `control.download_cancel`, and `control.raw_cdp` may ask unless the session policy, AllowAlways, Smart mode, or YOLO allows them.

A typical "fill the login form" flow is:

```
status → tabs.list → (already on the right tab? if not, tabs.select / tabs.new)
       → navigate.go url="https://app.example.com/login"
       → snapshot format=role           # capture refs
       → act kind=fill   ref=<email>   text="me@..."
       → act kind=fill   ref=<password> text="..."
       → act kind=click  ref=<submit>
       → snapshot format=role           # re-snapshot after navigation
       → verify expected element exists
```

## Refs are tied to the snapshot

`ref` is **only** valid against the most recent `snapshot.role` for the active tab. The moment the page navigates, the DOM mutates (SPA route change, modal opens/closes), or you switch tab — refs are stale.

**Re-snapshot when:**

- you just called `navigate.go / .back / .forward / .reload`
- you switched tab (`tabs.select`)
- an `act` returned an error that looks like "ref not found" / "no such element" / "detached"
- the URL bar in the snapshot output differs from what you expect
- a previous `act` triggered an obvious UI change (modal, page transition, form expansion)

## Stale-ref auto-recovery — what to expect

The tool tries **one** automatic recovery before bubbling up a stale-ref error: it re-snapshots, looks for an element with the same `role` and `text` (or a substring match) as the original ref, and retries with the new ref. On success the result string ends with `(ref auto-recovered)` so you know it kicked in. On failure you get the original error.

Practical rules:

- If a recovery happened, **verify the next action's prerequisites freshly** — a recovered ref means the DOM rearranged.
- If recovery fails, **resnapshot manually and re-plan** — don't keep hammering the same `act` call.
- Recovery only kicks in for `act.kind`. `navigate`, `tabs.*`, and `control.*` do not retry.

## When to stop and ask the user

These five situations are **blocking** — do not guess your way through them. Call `ask_user_question` and wait.

| Signal in the snapshot or error | What you must do |
| --- | --- |
| Login form / email + password / "Sign in" button | Ask the user to sign in, or supply credentials via the right channel. Never type credentials you guessed. |
| 2FA / OTP code prompt | Ask the user for the code (they have the device). |
| CAPTCHA / "I'm not a robot" | Ask the user to solve it. Do not attempt to bypass. |
| Camera / microphone / notification permission prompt | Ask the user — that's a system dialog only they can answer. |
| Browser-native file picker or download confirmation | If you triggered it, `control.handle_dialog`. If it's a host-driven save dialog, ask the user. |

When you have to stop, the right call is roughly:

```
ask_user_question({
  reason: "Browser flow requires you",
  questions: [{ q: "I see a CAPTCHA on https://...; please solve it, then say 'continue'.", ... }]
})
```

## Tab discipline

Multi-tab work loses refs more than anything else. Two rules keep you sane:

1. **Name your tabs as soon as you open them.** Right after `tabs.new`, jot the `target_id` and the URL/role in your reasoning ("tab A11C = github, tab B22D = jira"). Always pass `target_id` explicitly to `tabs.select` instead of relying on "active".
2. **One snapshot per action burst per tab.** Don't snapshot tab A, switch to tab B for two ops, switch back to A, and reuse the old A refs. Re-snapshot when you come back.

## Real Chrome tabs

Use the extension-backed tab lease protocol whenever the task depends on the user's real Chrome state:

```
status
tabs.open_user_tabs
tabs.claim target_id="<chrome-tab-id>"
snapshot / act / observe
tabs.finalize
```

Rules:

- `tabs.claim` takes temporary control of a real user tab. Release it with `tabs.release` or `tabs.finalize` when the task ends.
- `tabs.select` with a numeric extension tab id also activates and controls that real Chrome tab. Prefer `tabs.claim` when your intent is explicit takeover; use `tabs.select` for tab switching after you know the target id.
- `tabs.finalize` for a claimed user tab must not close the tab by default; it releases Hope Agent control.
- `tabs.new` creates a Hope-controlled automation tab. `tabs.finalize` closes agent-created tabs unless their target id is listed in `keep`.
- If a tab is already claimed by another Hope session, do not steal it unless the user explicitly asked to take over; then pass `steal:true`.
- If the extension is missing or disabled, real Chrome tasks are blocked. T

More from this repository

code-reviewSkill

email-draftSkill

Use when the user asks to draft, polish, translate, or reply to an email. Produces a clean draft with subject line, greeting, body, and sign-off, plus a pre-send self-check.

feishuSkill

Use when the user mentions 飞书 / Feishu / Lark workspace operations: docx (云文档) read/write, bitable (多维表格) records / views / dashboards, drive (云盘) upload/download, wiki (知识库) link resolution, approval (审批) instance create/cancel/query, calendar (日历) event create/list/update + attendees, contact (联系人) user/department lookup, hire (招聘) job/talent/application listing. Trigger on phrases like 'OKR 周报', '把这份文档发到飞书云盘', '给团队拉个评审会议', '查 [姓名] 的联系方式', '撤销那条审批', '/wiki 链接', or any request that mentions a feishu / lark URL / token (doxcn.../bascn.../wikcn.../boxcn.../om_...).

ha-find-skillsSkill

Discover and install third-party skills from external registries when the user needs a capability that no currently-active skill covers. Trigger when: (1) the user explicitly asks 'find a skill for X', 'is there a skill that does X', 'install a skill to X', (2) the user requests a well-known integration (Slack, Notion, Trello, GitHub, Hue, Sonos, iMessage, weather, TTS, transcription …) that isn't in the active skill catalog, (3) you are about to hand-write ad-hoc shell / API code for a domain that almost certainly has a published skill. Do NOT trigger if an active skill already covers the need — scan the visible skill catalog first.

ha-logsSkill

Self-service diagnostics — query Hope Agent's local SQLite databases (logs / sessions / background jobs) directly via the `exec` tool to investigate problems, analyze usage, and locate root causes. Trigger on: user reports something broken / failing / slow / stuck / not responding ('X 不工作', 'X 报错', 'X 卡住', '为什么 X 失败', 'why did X fail', 'show me the logs', 'check what happened'); ad-hoc data analysis ('this week's token usage', '最近调用最多的工具', 'how many subagent runs failed', 'tool error rate', 'find sessions where X happened'); verifying a fix ('did the error stop after I changed Y'). Use BEFORE asking the user to paste log snippets — the data is on disk, query it directly. Read-only — SELECT only, never UPDATE/DELETE/INSERT/DROP.

ha-mac-controlSkill

Hope Agent native macOS desktop control — the standard `mac_control` status / diagnostics / apps / dock / spaces / snapshot / visual / windows / menu / clipboard / dialog loop, target-first action rules, no-blind-coordinate policy, and recovery for stale AX/window/menu/dialog state. Load whenever using `mac_control`, or when the user asks to control local Mac apps, Dock, Spaces, click/type/menu/window/dialog/clipboard, automate Finder/TextEdit/System Settings, visually locate UI, or says 控制 Mac, macOS 自动化, 点按钮, 打开应用, Dock, Space, 关闭窗口, 菜单点击, 视觉定位.

ha-self-diagnosisSkill

Self-understanding and issue reporting for Hope Agent itself. Use when the user asks how Hope Agent works internally, asks about its own source code/docs/runtime behavior, reports a bug/failure/slowness/crash, asks to diagnose logs, or asks to create/submit a GitHub issue for a bug, feature request, or improvement (including when there is no bug). Chinese triggers: 自查, 了解自己, 自我诊断, 排查 Hope Agent, 提交 issue, 需求 issue, 功能改进.

ha-self-updateSkill

Check for and install Hope Agent updates through conversation. Use whenever the user asks about upgrades, new versions, release notes, or reports a bug that might already be fixed upstream — phrases like 'upgrade Hope Agent', 'update hope agent', 'check for new version', '升级一下', '有新版本吗', '帮我升级', 'is there a newer build', 'check release notes', 'install the latest'. Also use proactively when an `app_update(action=\"check\")` result shows `has_update: true` and the user hasn't been told yet. Covers all three formfactors: desktop GUI bundle (DMG/MSI/AppImage), `hope-agent server` daemon installed via Homebrew/Scoop/AUR/apt/dnf, and headless single-binary deployments. The upgrade is always user-confirmed via `ask_user_question` — never silent.