Skip to main content
ClaudeWave
Harusame64 avatar
Harusame64

desktop-touch-mcp

Ver en GitHub

Windows computer-use MCP server: drive any desktop app via semantic discover-then-act targeting (entities + leases, not pixel coordinates), with per-action perception guards, a native Rust UIA engine, and Chrome CDP. Works with Claude, Cursor, and any MCP client.

MCP ServersRegistry oficial1 estrellas1 forksTypeScriptMITActualizado today
ClaudeWave Trust Score
87/100
Trusted
Passed
  • Open-source license (MIT)
  • Actively maintained (<30d)
  • Clear description
  • Topics declared
Last scanned: 6/11/2026
Install in Claude Code / Claude Desktop
Method: NPX · @harusame64/desktop-touch-mcp
Claude Code CLI
claude mcp add desktop-touch-mcp -- npx -y @harusame64/desktop-touch-mcp
claude_desktop_config.json (Claude Desktop)
{
  "mcpServers": {
    "desktop-touch-mcp": {
      "command": "npx",
      "args": ["-y", "@harusame64/desktop-touch-mcp"]
    }
  }
}
1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).
2. Replace any <placeholder> values with your API keys or paths.
3. Restart Claude. The MCP server and its tools appear automatically.
Casos de uso

Resumen de MCP Servers

# desktop-touch-mcp

[![desktop-touch-mcp MCP server](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp/badges/card.svg)](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp)

[日本語](README.ja.md)

> **Computer-use MCP server for Windows.** Lets Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop — screenshots, UI Automation, Chrome CDP, keyboard / mouse, terminal — with **semantic discover-then-act targeting** that avoids pixel-coordinate guessing, and **per-action perception guards** that catch wrong-window typing before it happens.

```bash
npx -y @harusame64/desktop-touch-mcp
```

29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.

> **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** — `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was — and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
>
> Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.

---

## Features

- **⚡ High-performance Rust Native Core** — The UIA bridge and image-diff engine are written in Rust (`napi-rs` + `windows-rs`) and loaded as a native `.node` addon. Direct COM calls from a dedicated MTA thread eliminate PowerShell process spawning — `getFocusedElement` completes in **2 ms** (160× faster), and `getUiElements` returns full trees in **~100 ms** with a batch BFS algorithm that minimizes cross-process RPC. Image-diff operations use **SSE2 SIMD** for 13–15× throughput. When the native engine is unavailable, every function transparently falls back to PowerShell — zero config required.
- **🎯 Set-of-Marks (SoM) visual fallback** — Games, RDP sessions, and non-accessible Electron apps return clickable elements even when UIA is completely blind. `screenshot(detail="text")` automatically detects UIA sparsity and activates a Hybrid Non-CDP pipeline: Rust-powered grayscale + bilinear upscale → Windows OCR → clustering → red bounding-box annotation with numbered badges (`[1]`, `[2]`…). Two parallel representations returned: a visual PNG for spatial orientation and a semantic `elements[]` list with `clickAt` coords — no CDP required.
- **🔁 One-call confirmation on visual-only targets** — On UIA-blind targets (Electron, PWAs, games, custom canvases, RDP windows), `desktop_act` can fold the post-action confirmation into its own response: an optional `roiCapture` carrying a PNG crop of *just the region that changed* plus a lease-less preview of the controls now visible there. The agent confirms what its click did and finds the next target without a separate `desktop_state` + `screenshot`. On visual-only targets it is **on by default** for a visible change (`returnCapture:"on-change"`); pass `returnCapture:"never"` to suppress it, or `"always"` to force it. Never attached on structured targets (browser/CDP, UIA-rich native), where `desktop_state` is cheaper and exact — so those responses are unchanged.
- **LLM-native design** — Built around how LLMs think, not how humans click. `run_macro` batches multiple operations into a single API call; `diffMode` sends only the windows that changed since the last frame. Minimal tokens, minimal round-trips.
- **Reactive Perception Graph** — Register a `lensId` for a window or browser tab, pass it to action tools, and get guard-checked `post.perception` feedback after each action. It reduces repeated `screenshot` / `desktop_state` calls and prevents wrong-window typing or stale-coordinate clicks.
- **Full CJK support** — Uses Win32 `GetWindowTextW` for window titles, avoiding nut-js garbling. IME bypass input supported for Japanese/Chinese/Korean environments.
- **3-tier token reduction** — `detail="image"` (~443 tok) / `detail="text"` (~100–300 tok) / `diffMode=true` (~160 tok). Send pixels only when you actually need to see them.
- **1:1 coordinate mode** — `dotByDot=true` captures at native resolution (WebP). Image pixel = screen coordinate — no scale math needed. With `origin`+`scale` passed to `mouse_click`, the server converts coords for you — eliminating off-by-one / scale bugs.
- **Browser capture data reduction** — `grayscale=true` (~50% size), `dotByDotMaxDimension=1280` (auto-scaled with coord preservation), and `windowTitle + region` sub-crops help exclude browser chrome and other irrelevant pixels. Typical reduction for heavy captures: 50–70%.
- **Chromium smart fallback** — `detail="text"` on Chrome/Edge/Brave auto-skips UIA (prohibitively slow there) and runs Windows OCR. `hints.chromiumGuard` + `hints.ocrFallbackFired` flag the path taken.
- **UIA element extraction** — `detail="text"` returns button names and `clickAt` coords as JSON. Claude can click the right element without ever looking at a screenshot.
- **Auto-dock CLI** — `window_dock(action='dock')` snaps any window to a screen corner with always-on-top. Set `DESKTOP_TOUCH_DOCK_TITLE='@parent'` to auto-dock the terminal hosting Claude on MCP startup — the process-tree walker finds the right window regardless of title.
- **Emergency stop (Failsafe)** — Move the mouse to the **top-left corner (within 10px of 0,0)** to immediately terminate the MCP server.

---

## Requirements

| | |
|---|---|
| OS | Windows 10 / 11 (64-bit) |
| Node.js | v20+ recommended (tested on v22+) |
| PowerShell | 5.1+ (bundled with Windows) — used only as fallback when the Rust native engine is unavailable |
| Claude CLI | `claude` command must be available |

> **Note:** nut-js native bindings require the Visual C++ Redistributable.
> Download from [Microsoft](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) if not already installed.

---

## Installation

```bash
npx -y @harusame64/desktop-touch-mcp
```

The npm launcher resolves runtime strictly by npm package version. For package `X.Y.Z`, it fetches only GitHub Release tag `vX.Y.Z`, downloads `desktop-touch-mcp-windows.zip`, verifies its SHA256 digest, and only then expands it under `%USERPROFILE%\.desktop-touch-mcp`. Verified cached releases are reused on later runs.

Set `DESKTOP_TOUCH_MCP_HOME` to override the cache root directory.

> **On a shared or CI network?** The first run reads the GitHub Releases API to
> locate the runtime zip. The anonymous limit is 60 requests/hour per IP, which a
> shared public address (CI runners, office NAT) can exhaust before your download
> even starts. Set `GITHUB_TOKEN` (or `GH_TOKEN`) in the environment and the
> launcher authenticates the request, raising the limit to 5,000 requests/hour.
> No token is needed on an ordinary home connection.

> **Running the launcher from a source checkout?** A source build's
> `bin/launcher.js` carries a placeholder integrity hash (`sha256: "PENDING"`)
> instead of a finalized one. Rather than download and run an unverified runtime,
> the launcher fails closed — this guard stops an accidentally published or
> unfinalized launcher from silently starting unverified code. Published npm
> releases always ship a real SHA256, so end users never see this. If you are
> intentionally running the launcher from source, set
> `DESKTOP_TOUCH_MCP_ALLOW_UNVERIFIED=1` to skip integrity verification
> (development only).

### Register with Claude CLI

Add to `~/.claude.json` under `mcpServers`:

```json
{
  "mcpServers": {
    "desktop-touch": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@harusame64/desktop-touch-mcp"]
    }
  }
}
```

**No system prompt needed.** The command reference is automatically injected into Claude via the MCP `initialize` response's `instructions` field.

### Register with other clients (HTTP mode)

Clients that require an HTTP endpoint (GPT Desktop, VS Code Copilot, Cursor, etc.) can use the built-in Streamable HTTP transport:

```bash
npx -y @harusame64/desktop-touch-mcp --http
# or with a custom port:
npx -y @harusame64/desktop-touch-mcp --http --port 8080
```

The server starts at `http://127.0.0.1:23847/mcp` (localhost only). Register the URL in your MCP client settings. A health check is available at `http://127.0.0.1:<port>/health`.

In HTTP mode the system tray icon shows the active URL and provides quick-copy and open-in-browser shortcuts.

### Development install

```bash
git clone https://github.com/Harusame64/desktop-touch-mcp.git
cd desktop-touch-mcp
npm install
```

Build after install:

```bash
npm run build
```

For a local checkout, register the built server directly:

```json
{
  "mcpServers": {
    "desktop-touch": {
      "type": "stdio",
      "command": "node",
      "args": ["D:/path/to/desktop-touch-mcp/dist/index.js"]
    }
  }
}
```

> **Note:** Replace `D:/path/to/desktop-touch-mcp` with the actual path where you cloned this repository.

---

## Tools (29 Optimized Tools)

> 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.

### 🌐 World-Graph V2 (Primary Path)
| Tool | Description |
|---|---|
| `desktop_discover` | Observe the desktop. Returns interactive entities with leases (UIA, CDP, Terminal, Visual SoM). |
| `desktop_act` | Perform actions (click, type, drag, select) on entities via lease validation. Returns semantic diffs — plus an optional `roiCapture` (changed-region PNG + next-target preview) on visual-only targets
agentanthropicautomationcdpclaudeclaude-codecomputer-usedesktop-automationllmmcpmcp-servermodel-context-protocolnodejsrpascreenshottypescriptui-automationwindows

Lo que la gente pregunta sobre desktop-touch-mcp

¿Qué es Harusame64/desktop-touch-mcp?

+

Harusame64/desktop-touch-mcp es mcp servers para el ecosistema de Claude AI. Windows computer-use MCP server: drive any desktop app via semantic discover-then-act targeting (entities + leases, not pixel coordinates), with per-action perception guards, a native Rust UIA engine, and Chrome CDP. Works with Claude, Cursor, and any MCP client. Tiene 1 estrellas en GitHub y se actualizó por última vez today.

¿Cómo se instala desktop-touch-mcp?

+

Puedes instalar desktop-touch-mcp clonando el repositorio (https://github.com/Harusame64/desktop-touch-mcp) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.

¿Es seguro usar Harusame64/desktop-touch-mcp?

+

Nuestro agente de seguridad ha analizado Harusame64/desktop-touch-mcp y le ha asignado un Trust Score de 87/100 (tier: Trusted). Revisa el desglose completo de comprobaciones superadas y flags en esta página.

¿Quién mantiene Harusame64/desktop-touch-mcp?

+

Harusame64/desktop-touch-mcp es mantenido por Harusame64. La última actividad registrada en GitHub es de today, con 0 issues abiertos.

¿Hay alternativas a desktop-touch-mcp?

+

Sí. En ClaudeWave puedes explorar mcp servers similares en /categories/mcp, ordenados por popularidad o actividad reciente.

Despliega desktop-touch-mcp en tu cloud

Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.

¿Mantienes este repo? Añade un badge a tu README

Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.

Featured on ClaudeWave: Harusame64/desktop-touch-mcp
[![Featured on ClaudeWave](https://claudewave.com/api/badge/harusame64-desktop-touch-mcp)](https://claudewave.com/repo/harusame64-desktop-touch-mcp)
<a href="https://claudewave.com/repo/harusame64-desktop-touch-mcp"><img src="https://claudewave.com/api/badge/harusame64-desktop-touch-mcp" alt="Featured on ClaudeWave: Harusame64/desktop-touch-mcp" width="320" height="64" /></a>

Más MCP Servers

Alternativas a desktop-touch-mcp