Skill136 repo starsupdated yesterday

reading-business-cards

This Claude Code skill converts batch photos of business cards into structured CSV contact data by preprocessing sheets into overlapping high-resolution tiles, applying de-glaring to correct lighting and gloss reflections, then reading each tile via parallel API calls to extract and deduplicate contact fields. Use it when a user provides multiple business cards per image, mentions glare or readability problems, or needs efficient batch card transcription without expensive model passes.

View source Repository: claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/oaustegard/claude-skills /tmp/reading-business-cards && cp -r /tmp/reading-business-cards/reading-business-cards ~/.claude/skills/reading-business-cards

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Reading Business Cards

Turn photos or scans that pack many business cards into one image into a clean
contact list. The job is two stages, in order: **preprocess with the script,
then read the tiles it produces.** Read the tiles, not the original sheet — the
original is too low-resolution per card once the model downscales it.

## Why preprocess first

The model downscales any input image to ~1568px on the long edge before it sees
it. A phone photo of 50-60 cards is often 4000-6000px; downscaled to one image,
each card lands ~150px wide — unreadable, which forces you onto a more expensive
model. The script cuts the sheet into overlapping **tiles**, each near the
downscale cap (~1300px), so every card in a tile keeps 500px+ of real
resolution. That resolution recovery is what lets a cheaper model (Sonnet) read
cards that only the expensive one (Opus) could read before — and it lets Opus
read cards from far fewer tiles. **The grid is sized to the reader model** (see
Stage 1): pass `--model` and the script tiles aggressively for Haiku, moderately
for Sonnet, and least for Opus, because a stronger reader has a lower resolution
floor. The floor is set by the *pixels*, not the model's intelligence: at
~220px/card (a whole dense sheet) even Opus reads only company and some names,
not phone/email — so even Opus needs a few tiles for fine print on dense sheets.

De-glaring (illumination flattening + local contrast) is applied to each tile.
It corrects uneven lighting and the haze off glossy cards and plastic binder
sleeves. It cannot recover text where glare has clipped pixels to pure white —
that data is gone (see Limits).

## Stage 1 — Tile the sheets

The person only provides the images and the goal ("read these cards"). Derive
every parameter yourself; do not ask them to choose grid sizes or flags.

Real sheets are messy: cards scattered at angles, packed in binder sleeves,
overlapping, piled. Detecting individual card boundaries fails on all of these.
**Tiling ignores card boundaries** — it slices the sheet into a grid of
overlapping rectangles. Each tile holds a few cards at high resolution; the
overlap means a card split by one tile's edge is whole in its neighbour.

Run it sized to whichever model will read the tiles — pass `--model` and the
script measures each sheet's native card size, then derives the coarsest grid
whose cards still clear that model's OCR floor:

```bash
python3 scripts/prep_cards.py /mnt/user-data/uploads --out /home/claude/cards_work --model opus
```

`--model opus|sonnet|haiku` (or a full id like `claude-opus-4-8`) sets the floor:
Opus tiles least, Haiku most. Default is `sonnet` (the safe middle). On a dense
~660px-card sheet this yields roughly 6 tiles/sheet for Opus, 9-12 for Sonnet,
12-20 for Haiku. If the reader is the in-conversation model (the no-key path,
below), set `--model` to match whatever you're running. Overrides: `--target-px`
forces an explicit tile size, `--card-px` overrides the auto card-size estimate,
`--floor-px` overrides the per-model floor.

It prints the grid it chose per image (e.g.
`auto 3x2 from 4284x5712 [model=opus floor=350 card~668px -> target 2993]`) and
writes tiles to `cards_work/tiles/` (`<sheet>__r{R}_c{C}.png`) plus a
`manifest.json`. If a sheet's cards are smaller than the model's floor even at
native resolution, it warns — that sheet needs a re-shoot, not a finer grid.

Then **verify and self-adjust by inspection** — this is your judgment, not the
person's:
- View one representative tile. If the cards in it are crisp and fully legible,
proceed to read them all.
- If cards look small or dense (text fuzzy), rerun with a smaller cap for a finer
grid: `--target-px 1100`.
- If cards are clipped in half at tile edges, rerun with more overlap:
`--overlap 0.2`.
- If the cards are matte and text-only and a light haze remains, rerun with
`--binarize`. Skip binarize for color or logo-heavy cards — it flattens them.

De-glaring (illumination flatten + local contrast) is on by default and is
non-destructive. Decide on the extra flags by looking at a tile, then commit to
the full read.

## Stage 2 — Read and extract

There are two ways to read the tiles. **In-session reading is the universal path
and works for everyone, key or no key:** view the tiles with the `view` tool and
transcribe them in this conversation, applying the rules in
`prompts/haiku_extract.md` (one row per fully-visible card, skip edge-clipped
cards that are whole in a neighbour, never invent, `confidence` low when unsure).
The coarse model-sized grid is what makes this tractable — a dozen tiles per
sheet, not eighty — and the conversation model reading them is exactly the model
that reads cards well. Cost here is the subscription's usage allowance, not
per-token dollars; the coarser the grid (Opus), the fewer reads.

**The API runner is an optional optimization, only when an API key is present**
(`API_KEY` in env / `/mnt/project/claude.env`). It sends each tile to a model in
a separate parallel temperature-0 call using the distilled prompt, dedupes, and
writes the CSV — reading outside the chat context, so it is cheaper per token and
runs many tiles at once. Without a key it cannot run; use the in-session path.
The runner bills the API account per token; it buys parallelism and a cheaper
meter, never accuracy.

### Running the API runner (keyed path)

**Validate the model on a sample before the full run.** Haiku is ~6x cheaper but
its OCR is weaker; on phone photos of loose or angled cards it confidently
misreads names and marks the errors `high` confidence. (Gemini's free tier was
tested 2026-06-15 and read these poorly — do not reach for it here.) Tested
guidance:

1. Sample run with Haiku:
```bash
python3 scripts/extract_cards.py --work /home/claude/cards_work \
--out /home/claude/sample.csv --limit 8
```
2. View 2 of the tiles it read and compare against `sample.csv`. Check: are
names and companies correct? Is `confidence` honest (clear