Skip to main content
ClaudeWave
Skill839 repo starsupdated 2d ago

agent-desktop

agent-desktop is a command-line tool that enables AI agents to observe and control macOS desktop applications by exposing their accessibility trees as structured JSON with reference-based element identifiers. Use it when building autonomous agents that need to interact with native applications through programmatic UI observation and control, rather than building agents to use the tool directly.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/lahfir/agent-desktop /tmp/agent-desktop && cp -r /tmp/agent-desktop/skills/agent-desktop ~/.claude/skills/agent-desktop
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# agent-desktop

CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.

**Core principle:** agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.

## Installation

```bash
npm install -g agent-desktop
# or
bun install -g --trust agent-desktop
```

Requires macOS 12+ with Accessibility permission granted to your terminal. Screen Recording permission is also required for screenshots.

## Reference Files

Detailed documentation is split into focused reference files. Read them as needed:

| Reference | Contents |
|-----------|----------|
| `references/commands-observation.md` | snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples |
| `references/commands-interaction.md` | click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command |
| `references/commands-system.md` | launch, close, windows, clipboard, wait, batch, status, permissions, version |
| `references/workflows.md` | 12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns |
| `references/macos.md` | macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting |

## The Observe-Act Loop (Progressive Skeleton Traversal)

Use **progressive skeleton traversal** as the default approach. It reduces token consumption 78-96% for dense apps by exploring the UI in two phases: a shallow skeleton overview, then targeted drill-downs into regions of interest.

```
1. SKELETON → agent-desktop snapshot --skeleton --app "App" -i --compact
   Parse the overview. Identify the region containing your target.
   Regions show children_count (e.g., "Sidebar" with children_count: 42).
   Named containers at truncation boundary have refs for drill-down.
   Keep the returned snapshot_id.

2. DRILL    → agent-desktop snapshot --root @e3 --snapshot <snapshot_id> -i --compact
   Expand the target region. Now you see its interactive elements.

3. ACT      → agent-desktop click @e12 --snapshot <snapshot_id>  (or type, select, toggle...)

4. VERIFY   → agent-desktop snapshot --root @e3 --snapshot <snapshot_id> -i --compact
   Re-drill the same region to confirm the state change.
   Scoped invalidation: only @e3's subtree refs are replaced.

5. REPEAT   → Continue drilling other regions or acting as needed.
```

**When to skip skeleton and use full snapshot instead:**
- Simple apps with few elements (Finder, Calculator, TextEdit)
- You already know the exact element name — use `find` instead
- Surface snapshots (menus, sheets, alerts) — these are already focused

**When skeleton shines:**
- Dense Electron apps (Slack, VS Code, Discord, Notion)
- Any app where full snapshot exceeds ~50 refs
- Multi-region workflows (sidebar + main content + toolbar)

## Ref System

- Refs assigned depth-first: `@e1`, `@e2`, `@e3`...
- Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
- In skeleton mode, named/described containers at truncation boundary also get refs (drill-down targets with empty `available_actions`)
- Static text, groups, containers remain in tree for context but have no ref
- Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
- Every snapshot returns `snapshot_id`; ref-consuming commands accept `--snapshot <snapshot_id>`
- `last_refmap.json` is only a latest-snapshot inspection artifact. The command path uses snapshot-scoped storage.
- After any action that changes UI, re-drill the affected region or re-snapshot
- **Scoped invalidation:** re-drilling `--root @e3` only replaces refs from @e3's previous drill — refs from other regions and the skeleton itself are preserved

## JSON Output Contract

Every command returns a JSON envelope on stdout:

**Success:** `{ "version": "2.0", "ok": true, "command": "snapshot", "data": { ... } }`
**Error:** `{ "version": "2.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }`

Exit codes: `0` success, `1` structured error, `2` argument error.

### Error Codes

| Code | Meaning | Recovery |
|------|---------|----------|
| `PERM_DENIED` | Accessibility or Screen Recording permission not granted | Grant the named permission in System Settings |
| `ELEMENT_NOT_FOUND` | Ref cannot be resolved against the live UI | Re-run snapshot, use fresh ref |
| `APP_NOT_FOUND` | App not running | Launch it first |
| `ACTION_FAILED` | AX action rejected | Try an explicit alternative command |
| `ACTION_NOT_SUPPORTED` | Element can't do this | Use different command |
| `STALE_REF` | Ref from old snapshot | Re-run snapshot |
| `SNAPSHOT_NOT_FOUND` | Snapshot ID is missing or expired | Run `snapshot` again and use the returned ID |
| `POLICY_DENIED` | A physical/headed path was blocked | Use an explicit mouse/focus/keyboard command if physical interaction is intended |
| `WINDOW_NOT_FOUND` | No matching window | Check app name, use list-windows |
| `PLATFORM_NOT_SUPPORTED` | Adapter method not implemented on this platform | Use a supported platform adapter |
| `TIMEOUT` | Wait condition not met | Increase --timeout |
| `INVALID_ARGS` | Bad arguments | Check command syntax |
| `NOTIFICATION_NOT_FOUND` | Notification index no longer exists | Re-run list-notifications |

## Command Quick Reference (54 commands)

### Observation
```
agent-desktop snapshot --skeleton --app "App" -i --compact  # Skeleton overview (preferred)
agent-desktop snapshot --root @e3 -i --compact              # Drill into region
agent-desktop snapshot --app "App" -i                       # Full tree (simple apps)
agent-desktop snapshot --app "App" --surface menu -i        # Surface snapshot
agent-desktop screenshot --app "App" out.png                # PNG screenshot
agent-desktop find --