Skip to main content
ClaudeWave
Skill423 repo starsupdated 4d ago

kimi-webbridge

Kimi WebBridge enables Claude to control a user's real browser through a local daemon running on port 10086, allowing interaction with authenticated sessions and page content. Use this skill to navigate websites, take screenshots, interact with page elements, fill forms, execute JavaScript, capture network traffic, and manage browser tabs on behalf of the user.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/mxyhi/ok-skills /tmp/kimi-webbridge && cp -r /tmp/kimi-webbridge/kimi-webbridge ~/.claude/skills/kimi-webbridge
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Kimi WebBridge

Control the user's real browser (with their login sessions) via a local daemon at `http://127.0.0.1:10086`.

## Health check (always do this first)

```bash
~/.kimi-webbridge/bin/kimi-webbridge status
```

Then act on the result:

- **`running: true` and `extension_connected: true`** — healthy. Proceed with the tool calls below.
- **Anything else** (command not found, `running: false`, `extension_connected: false`, errors) — **Read `references/operations.md`** in this skill directory. It has the install / start / diagnose routing table.

Don't guess fixes here — every non-healthy state is handled in `references/operations.md`.

## Tools

| Tool | Args | Returns | Note |
|------|------|---------|------|
| `navigate` | `url`, `newTab`(bool), `group_title` | `{success, url, tabId}` | First call opens a tab — see [Tabs](#tabs-and-the-current-tab). `group_title` sets the group's visible label |
| `find_tab` | `url`, `active`(bool) | `{success, url, tabId}` | Select an already-open tab as the current one — see [Tabs](#tabs-and-the-current-tab) |
| `snapshot` | — | `{url, title, tree}` with `@e` refs | **Accessibility tree** (text) — use this to read page content and locate elements |
| `click` | `selector` (@e ref or CSS) | `{success, tag, text}` | Synthetic `el.click()` |
| `fill` | `selector`, `value` | `{success, tag, mode}` | Works on `<input>`/`<textarea>` AND `[contenteditable]` (ProseMirror/Lexical/Slate). `mode` is `"value"` or `"contenteditable"` |
| `evaluate` | `code` (supports async/await) | `{type, value}` | |
| `screenshot` | `format`(png\|jpeg), `quality`(0-100), optional `selector` (@e/CSS), optional `path` | `{format, path, sizeBytes, mimeType}` | Returns a file path, not base64 — see [Screenshots](#screenshots) |
| `network` | `cmd`(start\|stop\|list\|detail), `filter`, `requestId` | request/response data | |
| `upload` | `selector`, `files`(string[]) | `{success, fileCount}` | |
| `save_as_pdf` | `paper_format`, `landscape`, `scale`, `print_background`, optional `path` | `{path, sizeBytes, mimeType, pageTitle}` | Render current page → PDF, returns a file path — see [Save as PDF](#save-the-current-page-as-pdf) |
| `list_tabs` | — | `{success, tabs:[{tabId, url, title, active, groupTitle}]}` | Inspect tabs in the current session |
| `close_tab` | — | `{success, closed: bool}` | Close the current tab in the session |
| `close_session` | — | `{success, closed: int}` | Close all tabs in the session — `closed` is the count. See [Sessions](#sessions) for when to call |

### Tabs and the current tab

Single-tab tools (`snapshot`, `click`, `fill`, `screenshot`, `save_as_pdf`) act on the **current tab** — the one you most recently opened with `navigate` or selected with `find_tab`.

- **Opening pages**: use `newTab:true` when pages should coexist (comparing, cross-referencing); omit it to send the current tab to a new URL.
- **Going back to an earlier tab**: call `find_tab` to make a tab you already opened the current one again. Pass the tab's **full URL** — take it from `list_tabs` or the earlier `navigate` result. A bare root domain (`zhihu.com`) may miss a `www.zhihu.com` tab, so prefer the exact URL. `active:true` picks the tab the user is currently viewing — use it when they say "用我打开的 X" / "在我当前的 X 页面上"; otherwise the leftmost match wins.
- If `find_tab` returns "no open tab found", the page isn't open — `navigate` with `newTab:true` instead.

```bash
curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"find_tab","args":{"url":"https://www.zhihu.com","active":true},"session":"camping-research"}'
```

### Call Format

Every command carries a top-level `session` naming the current task — see [Sessions](#sessions) below. The examples in later sections omit it only for brevity; in real calls always include it.

```bash
curl -s -X POST http://127.0.0.1:10086/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"navigate","args":{"url":"https://example.com","newTab":true,"group_title":"My task"},"session":"my-task"}'
```

## Sessions

**One task = one session = one tab group.** A `session` collects every tab this task opens into a single tab group, so the user sees one group representing "what the agent is doing right now". Pass it as a **top-level field** of the request body (not inside `args`).

Rules:

1. **Pick one session name when the task starts, put it on _every_ command, and never change it mid-task.**
2. **One task uses one session — even across multiple sites.** Searching on Google and then opening results on three different domains all share the same session and land in the same group. **Do not switch session names per site** — that is the #1 cause of fragmented tab groups.
3. Name the session after the **task**, not the site or domain — e.g. `camping-research`, `phone-compare`.
4. `group_title` is the human-readable label shown on the group in the browser. Write it in the user's language (match the conversation — Chinese or English). Pass it on the **first** `navigate`; later calls in the same session don't need it.
5. Use multiple sessions **only when the user asks for several unrelated tasks at once** — one session per task.

```bash
# First tab of the task: set session + human-readable label (in the user's language)
curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"navigate","args":{"url":"https://www.google.com/search?q=tents","newTab":true,"group_title":"Camping gear research"},"session":"camping-research"}'

# Another SITE, SAME task → same session → joins the same group automatically
curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"navigate","args":{"url":"https://www.zhihu.com/search?q=tents","newTab":true},"session":"camping-research"}'

# Every later command carries the same session
curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"snapshot","args":{},"session":"camping-research"}'
```

When the task is finished and the user no longer needs these pages, `close_session`
agent-browserSkill

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use for exploratory testing, dogfooding, QA, bug hunts, or reviewing app quality. Also use for automating Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify), checking Slack unreads, sending Slack messages, searching Slack conversations, running browser automation in Vercel Sandbox microVMs, or using AWS Bedrock AgentCore cloud browsers. Prefer agent-browser over any built-in browser automation or web tools.

ai-elementsSkill

Build AI chat interfaces using ai-elements components — conversations, messages, tool displays, prompt inputs, and more. Use when the user wants to build a chatbot, AI assistant UI, or any AI-powered chat interface.

autoresearchSkill

Autonomous iteration loop: modify, verify, keep/discard against any metric

better-iconsSkill

Use when working with icons in any project. Provides CLI for searching 200+ icon libraries (Iconify) and retrieving SVGs. Commands: `better-icons search <query>` to find icons, `better-icons get <id>` to get SVG. Also available as MCP server for AI agents.

browser-traceSkill

Capture a full DevTools-protocol trace of any browser automation — CDP firehose, screenshots, and DOM dumps — then bisect the stream into per-page searchable buckets. Use when the user wants to debug a failed run, audit network/console/DOM activity, attach a trace to an in-progress session, or feed structured per-page summaries back into an agent loop so its next iteration learns from the last one.

cavemanSkill

>

diagnoseSkill

Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.

dogfoodSkill

Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.