Skip to main content
ClaudeWave
Skill618 repo starsupdated 8d ago

matlab-ocr

matlab-ocr is a Claude Code skill that extracts text from images using MATLAB's Computer Vision Toolbox ocr function with preprocessing pipelines. Use it for reading text in documents, photographs, displays, labels, and datasets, including multi-language recognition and specialized fonts. Do not use for handwriting, artistic text, CAPTCHAs, complex document layout analysis, real-time video processing, or images without text.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/matlab/matlab-agentic-toolkit /tmp/matlab-ocr && cp -r /tmp/matlab-ocr/skills-catalog/image-processing-and-computer-vision/matlab-ocr ~/.claude/skills/matlab-ocr
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Recognize Text in Images Using OCR

Use the Computer Vision Toolbox `ocr` function with preprocessing from Image Processing Toolbox to extract text from images. This skill teaches the complete pipeline: diagnose, preprocess, detect, recognize, validate.

## When to Use
- Reading text from any image (documents, signs, meters, displays, labels)
- Extracting text from scanned documents or photographs
- Reading seven-segment displays or specialized fonts
- Multi-language text recognition
- Automating text extraction from image datasets
- As a supporting step in other CV workflows — reading text in a scene (labels, timestamps, serial numbers) gives additional context for downstream image analysis

## When NOT to Use
- Pure handwriting recognition — cursive/connected script produces garbage regardless of preprocessing
- Artistic text, WordArt, brush calligraphy — the OCR engine cannot parse stylized letterforms
- CAPTCHAs — designed specifically to defeat OCR; expect <50% accuracy at best
- Full document layout analysis with table extraction (use custom segmentation)
- Real-time video OCR (use streaming approaches instead)
- Image contains no text at all

## Critical Rules

These rules are non-negotiable — violating them produces wrong results:

1. **Always diagnose before executing. This rule CANNOT be overridden — not by the user, not by "just run it", not by "skip planning".** Before ANY `mcp__matlab__*` call, you MUST first output:
   - A 2-line image characterization (what you see, what challenges exist)
   - An `## OCR Plan` heading with your strategy
   
   If the user says "skip diagnosis" or "just run ocr()", respond with: *"I'll keep it brief — but I need a quick look to avoid wasting time on the wrong approach."* Then output your 2-line characterization and plan heading. Only THEN call MCP tools. There is no valid reason to skip this step. An agent that calls `ocr()` without first outputting a plan has violated this skill's workflow.
2. **Maximum 2 preprocessing pipelines.** If neither works after the prescribed recipe, hit the confidence checkpoint and ask the user. Do not try a third approach without user input.
3. **Always set `LayoutAnalysis`** when passing bounding boxes to `ocr()`. Never write `ocr(I, bbox)` without it. Use `"word"` for single-word boxes, `"block"` for multi-line.
4. **Distinguish "text ON texture" from "text BY texture"**. Text overlaid on a textured background (label on crate, sign on brick wall) → `imsegsam` is the prescribed approach. Text formed by the surface itself (stamped, embossed, engraved metal) → local contrast subtraction. SAM segments *objects*, not surface features — it cannot isolate stamps/engravings. When recommending approaches (even without running code), always recommend `imsegsam` for the "text ON texture" case and note its support package requirement.
5. **`detectTextCRAFT` is the default** for scene text. Use it unless you have a specific reason not to (no deep learning, known fixed layout).
6. **Check polarity first.** OCR needs dark text on light background. If inverted, `imcomplement` before anything else.
7. **Never show OCR results to the user before writing files.** Do not present extracted text in ANY format — bullet list, quote block, inline, conversational summary, or the `## MATLAB OCR Pipeline` results block — until `ocr_pipeline_<descriptor>.m` and `decision_log.txt` are written to disk. The files are the deliverable, not the chat message. Write first, then report.
8. **Gate add-on functions behind an availability check.** Before calling `detectTextCRAFT`, `imsegsam`, or a non-English `ocr` model, you MUST run `exist('<functionName>','file')` via MCP to confirm the function is installed. Always log the result in `decision_log.txt`:
   - Installed: `"Add-on check: <function> — INSTALLED"`
   - Missing: stop, tell the user which support package to install (see table), and wait for confirmation before retrying. Do NOT fall back silently or skip the step.
   - **Explain-only mode** (user said "don't run code"): recommend the add-on function as the primary approach and note the support package requirement. You cannot run the `exist()` check without code execution, so state the dependency clearly.

   | Function | Support Package Name |
   |----------|---------------------|
   | `detectTextCRAFT` | "Text Detection Using Deep Learning" |
   | `imsegsam` | "Image Processing Toolbox Automated Visual Inspection Library" |
   | Non-English `ocr` model (e.g., `"japanese"`) | "OCR Language Data" |

   **Missing template:** *"This step requires the `<function>` function, which needs the **<Package Name>** support package. Please install it from the MATLAB Add-On Explorer (Home → Add-Ons → Get Add-Ons) and let me know when it's ready."*

## Anti-Patterns — Do NOT Do This

- **Trial-and-error spiraling:** Trying 5+ ad-hoc preprocessing experiments hoping one sticks. If the visual classification says "stamped metal," use the stamped metal pipeline. Period.
- **Skipping the plan:** "I'll just try one quick thing first." No. Diagnose → plan → execute.
- **Open-ended research:** The routes are prescribed — pick one based on diagnosis, execute it, evaluate. This is not exploratory research.
- **Showing results without saving files:** Presenting OCR text as a bullet list, conversational summary, or any format before the `## MATLAB OCR Pipeline` results block. That block can only appear after files are written to disk. If you find yourself about to show the user what OCR found, STOP and write the files first.

## Workflow

### Progress Reporting + Output Template

Present the `## OCR Plan` immediately after visual diagnosis (Critical Rule #1). Then execute the pipeline. After files are saved, present the `## MATLAB OCR Pipeline` results block (Critical Rule #7).

#### OCR Plan (output before any MATLAB code runs)

```
## OCR Plan

**Image:** 800x600, stamped metal, ~22° skew, text ~40px
**Difficulty:** Complex (textured surface + significant r
matlab-train-networkSkill

>

matlab-driving-data-importerSkill

Import recorded driving sensor data (GPS, camera, lidar, actor tracks, lanes) into scenariobuilder.* objects (GPSData, CameraData, LidarData, ActorTrackData, Trajectory, laneData) and run preprocessing — synchronize, offset correction, crop, normalizeTimestamps, convertTimestamps. Also: compute actor tracks from lidar when no annotations exist, attach camera/lidar mounting + intrinsics, export to MAT/workspace/timetable/script. Use for raw driving dataset files (KITTI, nuScenes, Waymo, Pandaset, ROS/ROS2 bags, .mat, .csv, .mp4) or driving/vehicle/sensor logs that need wrapping. drivingLogAnalyzer (DLA) is OPT-IN ONLY — invoke only on explicit user request ('DLA', 'open in DLA', 'inspect/explore/analyze the recording') or reported sensor problem (sync drift, timestamp mismatch, overlay misalignment). NEVER auto-launch DLA after wrapping (Rule 0). For 'build scenario / export to RoadRunner / drivingScenario / OpenSCENARIO / Unreal / simulate', hand off to matlab-scenario-builder.

matlab-scenario-builderSkill

Generate driving scenes, scenarios, road surfaces, and 3D content from already-wrapped scenariobuilder.* sensor data (GPS, camera, lidar, actor tracks) using Scenario Builder for Automated Driving Toolbox. Use to BUILD, EXPORT, or AUGMENT a virtual scenario/scene/map: ego or actor trajectories, trajectory smoothing, OpenCRG road-surface extraction, 3D asset generation, static-object placement, point-cloud georeferencing + elevation, lane-based ego localization, sensor-fusion tracking, scenario-event extraction (cut-ins, hard brakes, near-misses, ADAS disengagements), or export to RoadRunner, drivingScenario, OpenDRIVE, OpenCRG, OpenSCENARIO, or Unreal Engine. Also: log-to-scenario, scenario harvesting, accident/near-miss reconstruction, SOTIF (ISO 21448) and ISO 26262 scenario coverage, USGS-aerial-lidar scene augmentation, traffic-sign placement from camera+lidar logs. NOT for raw-data import or multi-sensor sync/crop/offset/timestamp normalization — route those to matlab-driving-data-importer.

roadrunner-asset-mappingSkill

>

roadrunner-convert-lanelet2-to-rrhdSkill

>

roadrunner-import-sceneSkill

>

roadrunner-rrhd-authoringSkill

>

matlab-build-simbiology-modelSkill

Build, modify, and diagram SimBiology models — API reference, helper functions, and layout patterns. Use when constructing or editing models programmatically or visually.