Skill4.9k repo starsupdated 1mo ago

document-converter

The document-converter skill processes large or complex documents by first sampling their structure and content before performing full analysis. Use this skill when users need documents read, summarized, analyzed, or converted into Markdown artifacts or other formats, with special attention to handling large files efficiently through incremental processing rather than immediate full conversion.

View source Repository: magic

Install in Claude Code

Copy

git clone --depth 1 https://github.com/dtyq/magic /tmp/document-converter && cp -r /tmp/document-converter/backend/super-magic/agents/skills/document-converter ~/.claude/skills/document-converter

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Document Converter Workflow

Use this workflow when the user asks to read, summarize, analyze, or convert a large or complex document.

Do not start by converting the entire file into one large Markdown document. First build a lightweight understanding of the document, sample a few representative units, then decide how to read the rest.

## What Conversion Means

This skill uses "conversion" in two document-specific ways:

- Semantic conversion: turn complex documents into model-readable Markdown artifacts, such as `document.md`, `chunks/`, `document.index.json`, `document.outline.md`, `document.reading_state.json`, visual-understanding records, and summaries. Use this for normal reading, analysis, summarization, and Markdown export.
- Format conversion: create a new file in another supported document format. Use this only when the user explicitly asks for a converted file or when a format mismatch must be repaired before extraction.

Current supported raw format conversions:

- PDF -> `png`, `jpg`, `jpeg`
- Office-like documents -> `pdf`, `docx`, `pptx`, `xlsx`

Office-like documents include Word, PowerPoint, spreadsheets, WPS Office, OpenDocument, RTF, templates, macro-enabled files, and slide-show files when the current runtime converter can handle them.

Other document types should usually be read or exported through semantic conversion, especially `export_document_markdown`, rather than forced through `convert_document_format`.

## Default Approach

1. Inspect the document first: identify its type, size, structure, outline, and representative samples.
2. If the file is small, export it directly as one readable Markdown file.
3. If the file is large, sample a few representative units before choosing a full reading strategy.
4. Plan the next read from the sample result and the user's goal.
5. Extract or understand only the selected ranges into reusable Markdown artifacts.
6. Read `document.reading_state.json` between large-document steps to avoid rereading the same content.
7. For summaries, summarize smaller chunks first, then combine them into section-level and document-level summaries.
8. Only perform format conversion when the user explicitly asks for a converted file or when conversion is necessary for extraction.

## Large Document Rules

- Do not read or convert a large document all at once unless the user explicitly requires a full export.
- Always sample first for large files, then choose the next batch from the sample result.
- Use the outline, samples, and `document.reading_state.json` to decide what to read next.
- Prefer text extraction for ordinary document body content.
- Do not use one strategy blindly for the whole file. If the sample shows extractable text, read text ranges. If the sample is image-dominant or scanned, understand document images in batches.
- Do not call the generic visual understanding tool directly for document parsing. Use `understand_document_images` so results are saved and written back to chunks.
- When image understanding is used, process at most 10 images per call. Successful results are written back to the related chunk, and `visual-results/` is preserved as the per-image recognition record.
- Before starting a large visual-understanding workload, call `ask_user` to ask whether the user wants to continue because it may take a long time. This applies when many batches are needed, such as scanned PDFs or slide decks with many image-only pages.
- Do not call `ask_user` for small visual reads, such as a document with only a few pages or a single batch of up to 10 images that is clearly needed for the user's request.
- For spreadsheets, inspect sheets, headers, sample rows, and table size before extracting data.
- For slide decks, treat each slide as a natural unit.
- For Word-like documents, follow the heading structure when available.
- For notebooks, preserve cell order and cell type.

## Small Document Rules

- Small files can use `export_document_markdown` directly. The tool chooses the output artifact mode automatically.
- For small files, prefer the simple output structure: `document.md`, plus `assets/` when needed. If visual understanding runs, keep `visual-results/` as the recognition record.
- Do not create samples, chunks, indexes, or reading state just to read a small file.
- A PDF, Word file, PowerPoint file, or image set with 10 or fewer pages/slides/images is usually small.
- If a PDF, Word file, PowerPoint file, or image set has more than 10 pages/slides/images, use the large-document workflow instead of expecting a flat one-file result.
- Text, Markdown, and HTML files are small when they fit comfortably within one normal chunk.
- Small scanned files can be extracted and visually understood directly when needed; do not call `ask_user` unless many visual-understanding batches are required.
- Large-document exports automatically use progressive artifacts.

## When The User Wants A Summary

For a large document summary, do not send all extracted text into the model at once.

Use this sequence:

1. Inspect the structure.
2. Sample representative pages, slides, sheets, or sections.
3. Plan the next read from the sample and the user's goal.
4. Extract relevant chunks, or understand image pages in batches when the document is scanned.
5. Summarize each chunk.
6. Merge chunk summaries into section summaries.
7. Merge section summaries into a final answer.

## Code Mode Use

Use Code Mode as the execution path for this workflow. Keep the user-facing response focused on what was inspected, what was extracted, where the readable result is, and what can be done next. Do not expose internal class names, package paths, implementation details, or raw metadata unless the user asks for debugging details.

All document-converter SDK tool calls must be executed through the `run_sdk_snippet` tool by passing code in its `python_code` parameter.

Inside that `python_code`, use `from sdk.tool import tool` and call document-converter tools with `tool.

More from this repository

guidesSkill

canvas-designerSkill

Core canvas design skill covering project management, multimedia principles, AI image generation, web image search, and design marker processing. Load for any canvas design task. CRITICAL - When user message contains [@design_canvas_project:...] or [@design_marker:...] mentions, or when the user wants to generate video/animation/clip on a canvas project, you MUST load this skill first before any operations.

compact-chat-historySkill

Summarize and compress the current conversation history into a structured context snapshot, then call compact_chat_history to save it. Read this skill only when the user explicitly asks to compact/summarize — system-triggered compaction injects the instructions directly without requiring a skill read.

creating-slidesSkill

Slide/PPT creation skill that provides complete slide creation, editing, and management capabilities. Use when users need to create slides, make presentations, edit slide content, or manage slide projects. CRITICAL - When user message contains [@slide_project:...] mention, you MUST load this skill first before any operations.

crew-creatorSkill

deep-researchSkill

develop-data-analysis-dashboardSkill

Data analysis dashboard (instrument panel) development skill. Use when users need to develop data dashboards, create/edit Dashboard projects, build large-screen data boards, or perform dashboard data cleaning. Includes dashboard project creation, card plan, data cleaning (data_cleaning.py), card management tools (create_dashboard_cards, update_dashboard_cards, delete_dashboard_cards, query_dashboard_cards), map download tool (download_dashboard_maps), dashboard development, and validation.

dingtalk-cliSkill

Use when the user wants to interact with DingTalk/钉钉 in any way — including but not limited to: reading, querying, searching, sending, replying to, forwarding, or recalling DingTalk/钉钉 chat messages and chat history; managing group chats and conversations; sending DING alerts; querying contacts, org structure, AI search, or coworkers; reading, searching, creating, or editing DingTalk/钉钉 docs, drive files, sheets, AI tables, wiki, mail, calendar events, meeting rooms, AI meeting minutes, attendance, OA approvals, todos, reports/logs, live sessions, AI apps, permissions, or open-platform docs.