Skill839 estrellas del repoactualizado 3d ago
agent-desktop
agent-desktop is a command-line tool that enables AI agents to observe and control macOS desktop applications by exposing their accessibility trees as structured JSON with reference-based element identifiers. Use it when building autonomous agents that need to interact with native applications through programmatic UI observation and control, rather than building agents to use the tool directly.
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/lahfir/agent-desktop /tmp/agent-desktop && cp -r /tmp/agent-desktop/skills/agent-desktop ~/.claude/skills/agent-desktopDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# agent-desktop
CLI tool enabling AI agents to observe and control desktop applications via native OS accessibility trees.
**Core principle:** agent-desktop is NOT an AI agent. It is a tool that AI agents invoke. It outputs structured JSON with ref-based element identifiers. The observation-action loop lives in the calling agent.
## Installation
```bash
npm install -g agent-desktop
# or
bun install -g --trust agent-desktop
```
Requires macOS 12+ with Accessibility permission granted to your terminal. Screen Recording permission is also required for screenshots.
## Reference Files
Detailed documentation is split into focused reference files. Read them as needed:
| Reference | Contents |
|-----------|----------|
| `references/commands-observation.md` | snapshot, find, get, is, screenshot, list-surfaces — all flags, output examples |
| `references/commands-interaction.md` | click, type, set-value, select, toggle, scroll, drag, keyboard, mouse — choosing the right command |
| `references/commands-system.md` | launch, close, windows, clipboard, wait, batch, status, permissions, version |
| `references/workflows.md` | 12 common patterns: forms, menus, dialogs, scroll-find, drag-drop, async wait, anti-patterns |
| `references/macos.md` | macOS permissions/TCC, AX API internals, smart activation chain, surfaces, Notification Center, troubleshooting |
## The Observe-Act Loop (Progressive Skeleton Traversal)
Use **progressive skeleton traversal** as the default approach. It reduces token consumption 78-96% for dense apps by exploring the UI in two phases: a shallow skeleton overview, then targeted drill-downs into regions of interest.
```
1. SKELETON → agent-desktop snapshot --skeleton --app "App" -i --compact
Parse the overview. Identify the region containing your target.
Regions show children_count (e.g., "Sidebar" with children_count: 42).
Named containers at truncation boundary have refs for drill-down.
Keep the returned snapshot_id.
2. DRILL → agent-desktop snapshot --root @e3 --snapshot <snapshot_id> -i --compact
Expand the target region. Now you see its interactive elements.
3. ACT → agent-desktop click @e12 --snapshot <snapshot_id> (or type, select, toggle...)
4. VERIFY → agent-desktop snapshot --root @e3 --snapshot <snapshot_id> -i --compact
Re-drill the same region to confirm the state change.
Scoped invalidation: only @e3's subtree refs are replaced.
5. REPEAT → Continue drilling other regions or acting as needed.
```
**When to skip skeleton and use full snapshot instead:**
- Simple apps with few elements (Finder, Calculator, TextEdit)
- You already know the exact element name — use `find` instead
- Surface snapshots (menus, sheets, alerts) — these are already focused
**When skeleton shines:**
- Dense Electron apps (Slack, VS Code, Discord, Notion)
- Any app where full snapshot exceeds ~50 refs
- Multi-region workflows (sidebar + main content + toolbar)
## Ref System
- Refs assigned depth-first: `@e1`, `@e2`, `@e3`...
- Only interactive elements get refs: button, textfield, checkbox, link, menuitem, tab, slider, combobox, treeitem, cell
- In skeleton mode, named/described containers at truncation boundary also get refs (drill-down targets with empty `available_actions`)
- Static text, groups, containers remain in tree for context but have no ref
- Refs are deterministic within a snapshot but NOT stable across snapshots if UI changed
- Every snapshot returns `snapshot_id`; ref-consuming commands accept `--snapshot <snapshot_id>`
- `last_refmap.json` is only a latest-snapshot inspection artifact. The command path uses snapshot-scoped storage.
- After any action that changes UI, re-drill the affected region or re-snapshot
- **Scoped invalidation:** re-drilling `--root @e3` only replaces refs from @e3's previous drill — refs from other regions and the skeleton itself are preserved
## JSON Output Contract
Every command returns a JSON envelope on stdout:
**Success:** `{ "version": "2.0", "ok": true, "command": "snapshot", "data": { ... } }`
**Error:** `{ "version": "2.0", "ok": false, "command": "click", "error": { "code": "STALE_REF", "message": "...", "suggestion": "..." } }`
Exit codes: `0` success, `1` structured error, `2` argument error.
### Error Codes
| Code | Meaning | Recovery |
|------|---------|----------|
| `PERM_DENIED` | Accessibility or Screen Recording permission not granted | Grant the named permission in System Settings |
| `ELEMENT_NOT_FOUND` | Ref cannot be resolved against the live UI | Re-run snapshot, use fresh ref |
| `APP_NOT_FOUND` | App not running | Launch it first |
| `ACTION_FAILED` | AX action rejected | Try an explicit alternative command |
| `ACTION_NOT_SUPPORTED` | Element can't do this | Use different command |
| `STALE_REF` | Ref from old snapshot | Re-run snapshot |
| `SNAPSHOT_NOT_FOUND` | Snapshot ID is missing or expired | Run `snapshot` again and use the returned ID |
| `POLICY_DENIED` | A physical/headed path was blocked | Use an explicit mouse/focus/keyboard command if physical interaction is intended |
| `WINDOW_NOT_FOUND` | No matching window | Check app name, use list-windows |
| `PLATFORM_NOT_SUPPORTED` | Adapter method not implemented on this platform | Use a supported platform adapter |
| `TIMEOUT` | Wait condition not met | Increase --timeout |
| `INVALID_ARGS` | Bad arguments | Check command syntax |
| `NOTIFICATION_NOT_FOUND` | Notification index no longer exists | Re-run list-notifications |
## Command Quick Reference (54 commands)
### Observation
```
agent-desktop snapshot --skeleton --app "App" -i --compact # Skeleton overview (preferred)
agent-desktop snapshot --root @e3 -i --compact # Drill into region
agent-desktop snapshot --app "App" -i # Full tree (simple apps)
agent-desktop snapshot --app "App" --surface menu -i # Surface snapshot
agent-desktop screenshot --app "App" out.png # PNG screenshot
agent-desktop find --Del mismo repositorio