macos-vision-mcp

Name: woladi/macos-vision-mcp
Author: woladi

View on GitHub

MCP server for Claude Code — Apple Vision OCR & image analysis, fully offline, no API keys

MCP ServersOfficial Registry0 stars0 forks● JavaScriptUpdated today

ClaudeWave Trust Score

62/100

· OK

Passed

✓Actively maintained (<30d)
✓Clear description
✓Topics declared

Flags

!No standard license detected

Last scanned: 6/11/2026

Install in Claude Code / Claude Desktop

Method: NPX · macos-vision-mcp

Claude Code CLI

claude mcp add macos-vision-mcp -- npx -y macos-vision-mcp

claude_desktop_config.json (Claude Desktop)

{
  "mcpServers": {
    "macos-vision-mcp": {
      "command": "npx",
      "args": ["-y", "macos-vision-mcp"]
    }
  }
}

1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).

2. Replace any <placeholder> values with your API keys or paths.

3. Restart Claude. The MCP server and its tools appear automatically.

Use cases

Creative Dev Tools AI / ML

About

MCP Servers overview

# macos-vision-mcp

<p align="center">
  <img src=".github/assets/hero.jpg" alt="macos-vision-mcp — local, private, offline OCR for MCP-compatible LLMs" width="1200">
</p>

Local OCR & image analysis for any MCP client — private, offline, no API keys.

[![npm version](https://img.shields.io/npm/v/macos-vision-mcp?style=flat-square&logo=npm&logoColor=white)](https://www.npmjs.com/package/macos-vision-mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-ffd60a?style=flat-square)](LICENSE)
[![macOS 13.0+](https://img.shields.io/badge/macOS-13.0%2B-0078d7?logo=apple&logoColor=white&style=flat-square)](https://developer.apple.com/documentation/vision)
[![No API Key](https://img.shields.io/badge/no%20API%20key-required-brightgreen?style=flat-square)](#)
[![Offline](https://img.shields.io/badge/offline-yes-blue?style=flat-square)](#)

Pre-extracts text and image data locally before your AI ever sees it — cutting token usage by ~97% on real documents and returning structured paragraphs, lines, and bounding boxes so the model can reconstruct the document into Markdown, HTML, DOCX, or any other format. Files never leave your Mac: no cloud API, no API keys, no network requests.

## What you get

- OCR for images and PDFs (JPG, PNG, HEIC, TIFF, multi-page PDF) via Apple Vision Framework.
- ~97% token reduction: a 44-page PDF costs ~2,400 tokens instead of ~73,500.
- Reading-order paragraphs + raw text blocks with bounding boxes — rich structure for the model to reconstruct the document into any output format (Markdown, HTML, DOCX, JSON), not a lossy plain-text dump.
- Face detection, barcode/QR reading, and image classification — all on-device.
- Full document pipeline: OCR + faces + barcodes + rectangles in a single tool call.
- Works with Claude Code, Claude Desktop, and Cursor — any MCP-compatible client.
- No files uploaded to any server — processing stays entirely on your Mac.
- 100% offline after `npm install` — powered by Apple Vision Framework, same engine as Live Text in Photos.app.

## ❌ Without / ✅ With

❌ **Without macos-vision-mcp:**

- Sending a 44-page PDF costs ~73,500 tokens
- Every image, invoice, or contract goes through a cloud API
- Sensitive documents leave your machine on every request

✅ **With macos-vision-mcp:**

- Local Apple Vision pre-extracts text before Claude ever sees it
- ~2,400 tokens for the same 44-page PDF — 97% fewer
- Files never leave your Mac

## Privacy layer

macos-vision-mcp acts as a local pre-processing layer between your documents and the cloud. Useful for:

- Legal documents, contracts, NDAs
- Financial reports, invoices, internal spreadsheets
- Medical records or any GDPR-sensitive content
- Any situation where you want to extract structured data locally before deciding what (if anything) to send upstream

Instead of sending the raw document to your AI, you extract the text and structure locally first. The model then works only with the extracted text — never the original file.

## Quick Start

**Add to your MCP client (example for Claude Code):**

```sh
claude mcp add macos-vision-mcp -- npx -y macos-vision-mcp
```

Restart your client. `npx` fetches the package on first run, caches it, and the tools appear automatically — no separate install step. This is the convention used by most MCP servers and recommended by Anthropic, Cursor, and other clients.

> **Note:** On first run, `macos-vision` downloads prebuilt Swift helper binaries (`vision-helper`, `pdf-helper`) from its GitHub Releases (~300 KB, ~1–2s). Subsequent invocations hit the npx cache and start instantly. Xcode Command Line Tools are only required as a fallback when the download can't reach the network — set `MACOS_VISION_SKIP_DOWNLOAD=1` to force local compilation with `swiftc`.

> **Prefer instant cold-starts (no npx cache lookup)?** Install globally with `npm install -g macos-vision-mcp` and use the alternative config shown at the bottom of [Configuration](#configuration).

## Available Tools

| Tool               | What it does                                                                                                                                                                                                                                                   | Example prompt                                       |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| `ocr_image`        | Extract text from an image or PDF (JPG, PNG, HEIC, TIFF, PDF). Returns plain text, or per-page paragraphs + text blocks with `lineId` / `paragraphId` and bounding boxes. Accepts `start_page` / `max_pages` for partial PDF OCR.                              | "Read the text from ~/Desktop/screenshot.png"        |
| `detect_faces`     | Detect human faces and return their count and positions.                                                                                                                                                                                                       | "How many people are in this photo?"                 |
| `detect_barcodes`  | Read QR codes, EAN, UPC, Code128, PDF417, Aztec, and other 1D/2D codes.                                                                                                                                                                                        | "What does the QR code in /tmp/qr.jpg say?"          |
| `detect_document`  | Detect the four corner points of a document in a photo (paper, receipt, ID). Useful as a crop / deskew hint before OCR.                                                                                                                                        | "Find the document corners in ~/Desktop/receipt.jpg" |
| `classify_image`   | Classify image content into 1000+ categories with confidence scores.                                                                                                                                                                                           | "What is in this image?"                             |
| `analyze_document` | Returns structured JSON with reading-order paragraphs, raw text blocks (bbox / confidence), faces, barcodes, and rectangles — ready for the model to reconstruct into Markdown, HTML, or anything else. Also accepts `start_page` / `max_pages` for long PDFs. | "Reconstruct ~/Desktop/scan.pdf as clean Markdown"   |

## Usage

Use the tool name explicitly in your prompt to guarantee local processing:

**Extract text from an image or PDF:**

```
Use ocr_image to extract text from ~/Desktop/invoice.pdf
```

**Detect faces in a photo:**

```
Use detect_faces on ~/Photos/team.jpg and tell me how many people are in it
```

**Classify image content:**

```
Use classify_image on ~/Downloads/unknown.jpg
```

**Full document analysis + reconstruction:**

```
Use analyze_document on ~/Desktop/report.pdf and reconstruct it as clean Markdown
```

The tool returns structured JSON; the model picks the output format you ask for (Markdown, HTML, DOCX outline, etc.) without any extra dependencies — no Ollama, no cloud LLM, no extra tooling.

## Example workflows

Real-world combinations that work out of the box once the server is connected:

- **"Convert PDF → clean Markdown for LLM"** — `analyze_document` returns reading-order paragraphs and bounding boxes; the model renders Markdown ready to drop into a docs site, knowledge base, or RAG pipeline.
- **"Extract invoice data locally before sending to GPT"** — pull line items, totals, vendor, and dates from the PDF locally with `analyze_document`, then send only the structured JSON upstream. The original document never leaves your Mac.
- **"Scan receipts → JSON → expense tracker"** — `ocr_image` on a phone photo, the model normalizes amount / date / merchant, and pipes the result straight into your expense tool's API.
- **"Decode a QR code from a screenshot"** — `detect_barcodes` returns the decoded value plus symbology in one round trip.
- **"Crop a photo of a paper form before OCR"** — `detect_document` returns the four corner points so you (or a downstream tool) can deskew and crop the image before reading the text.

### Output schema (analyze_document)

```jsonc
{
  "source": { "path": "...", "pageCount": 1, "isPdf": false },
  "pages": [
    {
      "page": 0,
      // primary surface for reconstruction — reading-order paragraphs joined with "\n"
      "paragraphs": [
        { "paragraphId": 0, "lineIds": [0], "text": "ACME COFFEE" },
        { "paragraphId": 1, "lineIds": [1, 2], "text": "12 Main St\nPortland, OR" },
      ],
      // spatial fallback — raw blocks with page-local 0–1 bbox, confidence, line/paragraph membership
      "textBlocks": [
        {
          "text": "ACME COFFEE",
          "lineId": 0,
          "paragraphId": 0,
          "confidence": 0.99,
          "bbox": { "x": 0.21, "y": 0.04, "width": 0.58, "height": 0.06 },
        },
      ],
      "faces": [],
      "barcodes": [],
      "rectangles": [],
    },
  ],
  "summary": {
    "totalTextBlocks": 8,
    "totalParagraphs": 2,
    "totalFaces": 0,
    "totalBarcodes": 0,
    "totalRectangles": 0,
  },
}
```

Use `paragraphs[].text` for the 95% case (rebuild Markdown/HTML/plain text directly). Reach for `textBlocks[]` when you need spatial context — multi-column layouts, tables, forms, IDs.

**Notes:**

- `ocr_image` in `blocks` mode returns the same per-page shape minus the detection sections: `{ pages: [{ page, paragraphs, textBlocks }] }`.
- PDFs are processed page by page. All coordinates are page-local (0–1), and `paragraphId` / `lineId` reset on every page.
- Face, barcode, and rectangle detection on PDFs is best-effort — the underlying binary analyzes the file as a whol

Topics

apple-visionclaude-codemacosmcpocroffline

Frequently asked

What people ask about macos-vision-mcp

What is woladi/macos-vision-mcp?

woladi/macos-vision-mcp is mcp servers for the Claude AI ecosystem. MCP server for Claude Code — Apple Vision OCR & image analysis, fully offline, no API keys It has 0 GitHub stars and was last updated today.

How do I install macos-vision-mcp?

You can install macos-vision-mcp by cloning the repository (https://github.com/woladi/macos-vision-mcp) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.

Is woladi/macos-vision-mcp safe to use?

Our security agent has analyzed woladi/macos-vision-mcp and assigned a Trust Score of 62/100 (tier: OK). See the full breakdown of passed checks and flags on this page.

Who maintains woladi/macos-vision-mcp?

woladi/macos-vision-mcp is maintained by woladi. The last recorded GitHub activity is from today, with 0 open issues.

Are there alternatives to macos-vision-mcp?

Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.

1-click deploy

Deploy macos-vision-mcp to your cloud

Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.

Vercel Railway Render

Embeddable badge

Maintain this repo? Add a badge to your README

Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.

Markdown (README)

[![Featured on ClaudeWave](https://claudewave.com/api/badge/woladi-macos-vision-mcp)](https://claudewave.com/repo/woladi-macos-vision-mcp)

HTML

<a href="https://claudewave.com/repo/woladi-macos-vision-mcp"><img src="https://claudewave.com/api/badge/woladi-macos-vision-mcp" alt="Featured on ClaudeWave: woladi/macos-vision-mcp" width="320" height="64" /></a>

More MCP Servers

macos-vision-mcp alternatives

n8n-io

n8n

today

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

192.8k58.6kTypeScript

MCP ServersaiapisInstall

open-webui

today

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

141.9k20.4kPython

MCP ServersaillmInstall

google-gemini

gemini-cli

today

An open-source AI agent that brings the power of Gemini directly into your terminal.

105.3k14.1kTypeScript

MCP Serversaiai-agentsInstall

netdata

today

The fastest path to AI-powered full stack observability, even for lean teams.

79.2k6.5kC

MCP ServersaialertingInstall

D4Vinci

Scrapling

9d ago

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

64.4k6.3kPython

MCP Serversaiai-scrapingInstall

sansan0

TrendRadar

4d ago

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ，数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。

59.5k24.7kPython

MCP ServersaibarkInstall