Skill1.2k repo starsupdated today

oma-pdf

The oma-pdf skill converts PDF files into structured Markdown or other formats while preserving document layout, headings, tables, and images for AI processing and retrieval. Use it when you need to extract PDF content for language model context, prepare documents for RAG systems, or convert scanned PDFs requiring OCR into readable text formats.

View source Repository: oh-my-agent

Install in Claude Code

Copy

git clone --depth 1 https://github.com/first-fluke/oh-my-agent /tmp/oma-pdf && cp -r /tmp/oma-pdf/benchmarks/runs/oma/.agents/skills/oma-pdf ~/.claude/skills/oma-pdf

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# PDF Skill - PDF to Markdown Conversion

## Scheduling

### Goal
Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review.

### Intent signature
- User asks to convert, parse, read, extract, or transform a PDF.
- User needs PDF text, headings, lists, tables, or images prepared for AI consumption.
- User mentions "PDF to markdown", "parse PDF", "read this PDF", or equivalent wording.

### When to use
- Converting PDF documents to Markdown for LLM context or RAG
- Extracting structured content such as tables, headings, lists, images, footnotes, or hyperlinks
- Preparing PDF data for AI consumption
- Checking whether a PDF has a text layer before choosing OCR

### When NOT to use
- Generating or creating PDFs -> use document-generation tools
- Editing existing PDFs -> out of scope
- Reading an already-text file -> use direct file reading
- Processing HWP, HWPX, DOCX, XLSX, or slide decks -> use the matching document skill

### Expected inputs
- `input_path`: PDF file or folder path
- `output_dir`: optional target directory
- `format`: optional output format, default `markdown`
- `ocr_languages`: optional OCR language list for scanned or image-based PDFs
- `extraction_options`: optional flags for tagged structure, image extraction, or hybrid conversion

### Expected outputs
- Markdown, text, JSON, HTML, or combined extraction output
- Normalized Markdown when Markdown is produced
- A short report with output path, page count, and conversion issues

### Dependencies
- `uvx opendataloader-pdf` for standard conversion
- `uvx opendataloader-pdf-hybrid` for OCR or hybrid conversion
- `uvx mdformat` for Markdown normalization
- Local filesystem access to input and output paths
- Optional OCR runtime via the hybrid server

### Control-flow features
- Branches on text-layer quality, tagged PDF availability, scan/OCR needs, and user-requested output format
- Calls external CLI tools through `uvx`
- Reads local files and writes local extraction outputs
- Uses a hybrid server only when OCR or complex extraction needs justify it

## Structural Flow

### Entry
1. Confirm that the input path exists and is a PDF file, PDF folder, or supported batch input.
2. Check file size and warn when the input is large enough to risk slow conversion or memory pressure.
3. Resolve `output_dir` and the expected output filename.

### Scenes
1. **PREPARE**: Validate the input path, output target, and requested extraction options.
2. **ACQUIRE**: Assess whether the PDF has a readable text layer by extracting a text preview.
3. **ACT**: Convert using standard mode, tagged-structure mode, or hybrid OCR mode.
4. **VERIFY**: Run `mdformat` for Markdown output and inspect the result for readable structure.
5. **FINALIZE**: Report output path, page count, format, and any extraction quality issues.

### Transitions
- If the preview text is readable, use standard conversion.
- If the PDF is tagged and standard output is garbled, retry with `--use-struct-tree`.
- If the PDF is scanned or image-based, start or reuse the hybrid OCR server and convert with hybrid mode.
- If conversion fails because the PDF is encrypted, stop and ask for the password or an unlocked copy.
- If conversion hits memory or size limits, process smaller page ranges or batches.

### Failure and recovery
| Failure | Recovery |
|---------|----------|
| `uvx` unavailable | Ask user to install `uv` before conversion |
| Password-protected PDF | Ask for password or unlocked PDF |
| Garbled output | Retry with tagged structure or hybrid mode |
| Missing tables | Retry with hybrid mode for complex or borderless tables |
| OCR language mismatch | Retry with explicit OCR languages, for example `ko,en` |
| Large file or memory pressure | Split into page ranges or batch smaller inputs |

### Exit
- Success: output file exists, Markdown is formatted when applicable, and extracted structure is readable.
- Partial success: output exists but quality issues are reported explicitly.
- Failure: no reliable output is produced and the blocking cause is reported.

## Logical Operations

### Actions
| Action | SSL primitive | Evidence |
|--------|---------------|----------|
| Validate path and options | `VALIDATE` | Input preflight in execution protocol |
| Probe text layer | `READ` | Text preview extraction |
| Choose conversion strategy | `SELECT` | Standard, tagged, or hybrid mode decision |
| Run converter | `CALL_TOOL` | `uvx opendataloader-pdf` |
| Start OCR server | `CALL_TOOL` | `uvx opendataloader-pdf-hybrid` |
| Write output artifact | `WRITE` | Markdown, text, JSON, or HTML output |
| Normalize Markdown | `CALL_TOOL` | `uvx mdformat` |
| Inspect extraction quality | `VALIDATE` | Structure/readability verification |
| Report result | `NOTIFY` | Final user-facing summary |

### Tools and instruments
- `opendataloader-pdf`: primary PDF extraction CLI
- `opendataloader-pdf-hybrid`: hybrid OCR and complex extraction path
- `mdformat`: Markdown normalization
- Filesystem commands such as `file`, `wc`, or `pdfinfo` may be used for preflight when available

### Canonical command path
```bash
uvx opendataloader-pdf "{input_path}" --format markdown --output-dir "{output_dir}"
uvx mdformat "{output_path}"
```

For scanned/image-based PDFs, start OCR first and then convert through hybrid mode:
```bash
uvx opendataloader-pdf-hybrid --port 5002 --force-ocr --ocr-lang "{languages}"
uvx opendataloader-pdf --hybrid docling-fast "{input_path}" --format markdown --output-dir "{output_dir}"
```

### Resource scope
| Scope | Resource target |
|-------|-----------------|
| `LOCAL_FS` | Input PDFs and generated output files |
| `PROCESS` | `uvx` subprocesses and optional hybrid server |
| `MEMORY` | Extracted previews and validation notes |
| `OTHER` | OCR model/runtime behavior inside hybrid mode |

### Preconditions
- The input PDF path exists and is readable.
- The output locati

More from this repository

oma-academic-writerSkill

oma-architectureSkill

Architecture specialist for software/system design, module and service boundaries, tradeoff analysis, and stakeholder synthesis. Uses context-aware methods such as diagnostic routing, design-twice comparison, ATAM-style risk analysis, CBAM-style prioritization, and ADR-style decision records.

oma-backendSkill

Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.

oma-brainstormSkill

Design-first ideation that explores user intent, constraints, and approaches before any planning or implementation. Use for brainstorming, ideation, exploring concepts, and evaluating approaches.

oma-coordinationSkill

Guide for coordinating PM, Frontend, Backend, Mobile, and QA agents on complex projects via CLI. Use for manual step-by-step coordination and workflow guidance.

oma-dbSkill

Database specialist for SQL, NoSQL, and vector database modeling, schema design, normalization, indexing, transactions, integrity, concurrency control, backup, capacity planning, data standards, anti-pattern review, and compliance-aware database design. Use for database, schema, ERD, table design, document model, vector index design, RAG retrieval architecture, migration, query tuning, glossary, capacity estimation, backup strategy, database anti-pattern remediation work, and ISO 27001, ISO 27002, or ISO 22301-aware database recommendations.

oma-debugSkill

Bug diagnosis and fixing specialist - analyzes errors, identifies root causes, provides fixes, and writes regression tests. Use for bug, debug, error, crash, traceback, exception, and regression work.

oma-deepsecSkill