Windows computer-use MCP server: drive any desktop app via semantic discover-then-act targeting (entities + leases, not pixel coordinates), with per-action perception guards, a native Rust UIA engine, and Chrome CDP. Works with Claude, Cursor, and any MCP client.
- ✓Open-source license (MIT)
- ✓Actively maintained (<30d)
- ✓Clear description
- ✓Topics declared
claude mcp add desktop-touch-mcp -- npx -y @harusame64/desktop-touch-mcp{
"mcpServers": {
"desktop-touch-mcp": {
"command": "npx",
"args": ["-y", "@harusame64/desktop-touch-mcp"]
}
}
}MCP Servers overview
# desktop-touch-mcp
[](https://glama.ai/mcp/servers/Harusame64/desktop-touch-mcp)
[日本語](README.ja.md)
> **Computer-use MCP server for Windows.** Lets Claude, Cursor, or any MCP client see and operate your Windows 10/11 desktop — screenshots, UI Automation, Chrome CDP, keyboard / mouse, terminal — with **semantic discover-then-act targeting** that avoids pixel-coordinate guessing, and **per-action perception guards** that catch wrong-window typing before it happens.
```bash
npx -y @harusame64/desktop-touch-mcp
```
29 tools, native Rust engine (UIA in 2 ms), zero-config PowerShell fallback, full CJK support, MIT licensed. Add the snippet above to your Claude / Cursor / VS Code Copilot config and Claude can drive Notepad, Excel, Chrome, Windows Terminal, and any other app on your machine.
> **Why this over pixel-clicking?** Two ideas run through every tool: **discover-then-act** — `desktop_discover` returns interactive entities with short-lived leases instead of raw coordinates, so `desktop_act` operates on *what* you mean, not *where* it was — and **per-action perception guards** that verify the target window's identity and bounds before input lands, catching wrong-window typing and stale-coordinate clicks before they happen.
>
> Under the hood: an **82× average speedup** from the Rust native engine (UIA focus queries in 2 ms, SSE2-accelerated image diffing at 13–15×), with a transparent PowerShell fallback when the engine is absent. The npm launcher fetches only the GitHub Release tag matching the installed version and verifies the Windows runtime zip before extraction.
---
## Features
- **⚡ High-performance Rust Native Core** — The UIA bridge and image-diff engine are written in Rust (`napi-rs` + `windows-rs`) and loaded as a native `.node` addon. Direct COM calls from a dedicated MTA thread eliminate PowerShell process spawning — `getFocusedElement` completes in **2 ms** (160× faster), and `getUiElements` returns full trees in **~100 ms** with a batch BFS algorithm that minimizes cross-process RPC. Image-diff operations use **SSE2 SIMD** for 13–15× throughput. When the native engine is unavailable, every function transparently falls back to PowerShell — zero config required.
- **🎯 Set-of-Marks (SoM) visual fallback** — Games, RDP sessions, and non-accessible Electron apps return clickable elements even when UIA is completely blind. `screenshot(detail="text")` automatically detects UIA sparsity and activates a Hybrid Non-CDP pipeline: Rust-powered grayscale + bilinear upscale → Windows OCR → clustering → red bounding-box annotation with numbered badges (`[1]`, `[2]`…). Two parallel representations returned: a visual PNG for spatial orientation and a semantic `elements[]` list with `clickAt` coords — no CDP required.
- **🔁 One-call confirmation on visual-only targets** — On UIA-blind targets (Electron, PWAs, games, custom canvases, RDP windows), `desktop_act` can fold the post-action confirmation into its own response: an optional `roiCapture` carrying a PNG crop of *just the region that changed* plus a lease-less preview of the controls now visible there. The agent confirms what its click did and finds the next target without a separate `desktop_state` + `screenshot`. On visual-only targets it is **on by default** for a visible change (`returnCapture:"on-change"`); pass `returnCapture:"never"` to suppress it, or `"always"` to force it. Never attached on structured targets (browser/CDP, UIA-rich native), where `desktop_state` is cheaper and exact — so those responses are unchanged.
- **LLM-native design** — Built around how LLMs think, not how humans click. `run_macro` batches multiple operations into a single API call; `diffMode` sends only the windows that changed since the last frame. Minimal tokens, minimal round-trips.
- **Reactive Perception Graph** — Register a `lensId` for a window or browser tab, pass it to action tools, and get guard-checked `post.perception` feedback after each action. It reduces repeated `screenshot` / `desktop_state` calls and prevents wrong-window typing or stale-coordinate clicks.
- **Full CJK support** — Uses Win32 `GetWindowTextW` for window titles, avoiding nut-js garbling. IME bypass input supported for Japanese/Chinese/Korean environments.
- **3-tier token reduction** — `detail="image"` (~443 tok) / `detail="text"` (~100–300 tok) / `diffMode=true` (~160 tok). Send pixels only when you actually need to see them.
- **1:1 coordinate mode** — `dotByDot=true` captures at native resolution (WebP). Image pixel = screen coordinate — no scale math needed. With `origin`+`scale` passed to `mouse_click`, the server converts coords for you — eliminating off-by-one / scale bugs.
- **Browser capture data reduction** — `grayscale=true` (~50% size), `dotByDotMaxDimension=1280` (auto-scaled with coord preservation), and `windowTitle + region` sub-crops help exclude browser chrome and other irrelevant pixels. Typical reduction for heavy captures: 50–70%.
- **Chromium smart fallback** — `detail="text"` on Chrome/Edge/Brave auto-skips UIA (prohibitively slow there) and runs Windows OCR. `hints.chromiumGuard` + `hints.ocrFallbackFired` flag the path taken.
- **UIA element extraction** — `detail="text"` returns button names and `clickAt` coords as JSON. Claude can click the right element without ever looking at a screenshot.
- **Auto-dock CLI** — `window_dock(action='dock')` snaps any window to a screen corner with always-on-top. Set `DESKTOP_TOUCH_DOCK_TITLE='@parent'` to auto-dock the terminal hosting Claude on MCP startup — the process-tree walker finds the right window regardless of title.
- **Emergency stop (Failsafe)** — Move the mouse to the **top-left corner (within 10px of 0,0)** to immediately terminate the MCP server.
---
## Requirements
| | |
|---|---|
| OS | Windows 10 / 11 (64-bit) |
| Node.js | v20+ recommended (tested on v22+) |
| PowerShell | 5.1+ (bundled with Windows) — used only as fallback when the Rust native engine is unavailable |
| Claude CLI | `claude` command must be available |
> **Note:** nut-js native bindings require the Visual C++ Redistributable.
> Download from [Microsoft](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist) if not already installed.
---
## Installation
```bash
npx -y @harusame64/desktop-touch-mcp
```
The npm launcher resolves runtime strictly by npm package version. For package `X.Y.Z`, it fetches only GitHub Release tag `vX.Y.Z`, downloads `desktop-touch-mcp-windows.zip`, verifies its SHA256 digest, and only then expands it under `%USERPROFILE%\.desktop-touch-mcp`. Verified cached releases are reused on later runs.
Set `DESKTOP_TOUCH_MCP_HOME` to override the cache root directory.
> **On a shared or CI network?** The first run reads the GitHub Releases API to
> locate the runtime zip. The anonymous limit is 60 requests/hour per IP, which a
> shared public address (CI runners, office NAT) can exhaust before your download
> even starts. Set `GITHUB_TOKEN` (or `GH_TOKEN`) in the environment and the
> launcher authenticates the request, raising the limit to 5,000 requests/hour.
> No token is needed on an ordinary home connection.
> **Running the launcher from a source checkout?** A source build's
> `bin/launcher.js` carries a placeholder integrity hash (`sha256: "PENDING"`)
> instead of a finalized one. Rather than download and run an unverified runtime,
> the launcher fails closed — this guard stops an accidentally published or
> unfinalized launcher from silently starting unverified code. Published npm
> releases always ship a real SHA256, so end users never see this. If you are
> intentionally running the launcher from source, set
> `DESKTOP_TOUCH_MCP_ALLOW_UNVERIFIED=1` to skip integrity verification
> (development only).
### Register with Claude CLI
Add to `~/.claude.json` under `mcpServers`:
```json
{
"mcpServers": {
"desktop-touch": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@harusame64/desktop-touch-mcp"]
}
}
}
```
**No system prompt needed.** The command reference is automatically injected into Claude via the MCP `initialize` response's `instructions` field.
### Register with other clients (HTTP mode)
Clients that require an HTTP endpoint (GPT Desktop, VS Code Copilot, Cursor, etc.) can use the built-in Streamable HTTP transport:
```bash
npx -y @harusame64/desktop-touch-mcp --http
# or with a custom port:
npx -y @harusame64/desktop-touch-mcp --http --port 8080
```
The server starts at `http://127.0.0.1:23847/mcp` (localhost only). Register the URL in your MCP client settings. A health check is available at `http://127.0.0.1:<port>/health`.
In HTTP mode the system tray icon shows the active URL and provides quick-copy and open-in-browser shortcuts.
### Development install
```bash
git clone https://github.com/Harusame64/desktop-touch-mcp.git
cd desktop-touch-mcp
npm install
```
Build after install:
```bash
npm run build
```
For a local checkout, register the built server directly:
```json
{
"mcpServers": {
"desktop-touch": {
"type": "stdio",
"command": "node",
"args": ["D:/path/to/desktop-touch-mcp/dist/index.js"]
}
}
}
```
> **Note:** Replace `D:/path/to/desktop-touch-mcp` with the actual path where you cloned this repository.
---
## Tools (29 Optimized Tools)
> 📖 **Full Reference**: [`docs/system-overview.md`](docs/system-overview.md) — Exhaustive guide on parameters, return schemas, and coordinate math.
### 🌐 World-Graph V2 (Primary Path)
| Tool | Description |
|---|---|
| `desktop_discover` | Observe the desktop. Returns interactive entities with leases (UIA, CDP, Terminal, Visual SoM). |
| `desktop_act` | Perform actions (click, type, drag, select) on entities via lease validation. Returns semantic diffs — plus an optional `roiCapture` (changed-region PNG + next-target preview) on visual-only targetsWhat people ask about desktop-touch-mcp
What is Harusame64/desktop-touch-mcp?
+
Harusame64/desktop-touch-mcp is mcp servers for the Claude AI ecosystem. Windows computer-use MCP server: drive any desktop app via semantic discover-then-act targeting (entities + leases, not pixel coordinates), with per-action perception guards, a native Rust UIA engine, and Chrome CDP. Works with Claude, Cursor, and any MCP client. It has 1 GitHub stars and was last updated today.
How do I install desktop-touch-mcp?
+
You can install desktop-touch-mcp by cloning the repository (https://github.com/Harusame64/desktop-touch-mcp) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.
Is Harusame64/desktop-touch-mcp safe to use?
+
Our security agent has analyzed Harusame64/desktop-touch-mcp and assigned a Trust Score of 87/100 (tier: Trusted). See the full breakdown of passed checks and flags on this page.
Who maintains Harusame64/desktop-touch-mcp?
+
Harusame64/desktop-touch-mcp is maintained by Harusame64. The last recorded GitHub activity is from today, with 0 open issues.
Are there alternatives to desktop-touch-mcp?
+
Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.
Deploy desktop-touch-mcp to your cloud
Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.
Maintain this repo? Add a badge to your README
Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.
[](https://claudewave.com/repo/harusame64-desktop-touch-mcp)<a href="https://claudewave.com/repo/harusame64-desktop-touch-mcp"><img src="https://claudewave.com/api/badge/harusame64-desktop-touch-mcp" alt="Featured on ClaudeWave: Harusame64/desktop-touch-mcp" width="320" height="64" /></a>More MCP Servers
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
An open-source AI agent that brings the power of Gemini directly into your terminal.
The fastest path to AI-powered full stack observability, even for lean teams.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。