Skip to main content
ClaudeWave
Skill4.1k repo starsupdated today

pdf-toolkit

pdf-toolkit performs structural PDF operations: extract text and tables using pdfplumber, merge or split pages across multiple PDFs with range syntax, fill form fields from JSON data, or generate new PDFs from data via reportlab. Use this skill for deterministic programmatic work like pulling tables from reports, combining documents, extracting page ranges, populating tax forms, or building PDFs from structured data, rather than the sibling nano-pdf skill which rewrites content using natural language.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/pdf-toolkit && cp -r /tmp/pdf-toolkit/src/opensquilla/skills/bundled/pdf-toolkit ~/.claude/skills/pdf-toolkit
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# pdf-toolkit

Deterministic, structural PDF operations. Use this skill for programmatic
work where you know exactly what you want done. Use the sibling `nano-pdf`
skill instead when the task is "rewrite this page to say X" — `nano-pdf`
applies a natural-language edit; `pdf-toolkit` applies an explicit operation.

## Decide the operation

| Goal | Script |
|---|---|
| Get text or tables out of a PDF | `extract.py` |
| Combine pages from multiple PDFs | `merge.py` |
| Split a PDF by page ranges | `split.py` |
| Fill `/Tx` form fields in a PDF | `form_fill.py` |
| Build a new PDF from data | inline `reportlab` snippet, see Path C below |

---

## Path A: Extract

```bash
python {baseDir}/scripts/extract.py /path/to/doc.pdf --json
```

Output:

```json
{
  "pages": 12,
  "metadata": {"title": "...", "author": "..."},
  "text": [
    {"page": 1, "content": "..."},
    {"page": 2, "content": "..."}
  ],
  "tables": [
    {"page": 3, "rows": [["..."], ["..."]]}
  ]
}
```

Text uses `pdfplumber` (already in default dependencies) which preserves
column layout better than naive PDF text extraction. Tables use
`pdfplumber.extract_tables()` with default settings; for tricky layouts
pass `--tables-strategy lines|text|explicit` to switch detection mode.

For OCR (scanned PDFs), this skill does not include Tesseract — use the
sibling skill that wraps an OCR engine (out of scope here).

---

## Path B: Merge / Split

Merge full files:

```bash
python {baseDir}/scripts/merge.py a.pdf b.pdf c.pdf --out combined.pdf
```

Or merge specific page ranges with the manifest form:

```bash
python {baseDir}/scripts/merge.py manifest.json --out combined.pdf
```

`manifest.json`:

```json
[
  {"file": "a.pdf", "pages": "1-3"},
  {"file": "b.pdf", "pages": "5,7,9-11"},
  {"file": "c.pdf"}
]
```

Page ranges are 1-based, comma-separated, hyphen for ranges. Omit `pages` to
include the whole file. Splits use the same syntax in reverse:

```bash
python {baseDir}/scripts/split.py input.pdf --pages "1-3,7,10-12" --out output_dir/
```

Each range writes one output file: `output_dir/input_001.pdf`,
`output_dir/input_002.pdf`, …

---

## Path C: Form fill

```bash
python {baseDir}/scripts/form_fill.py form.pdf data.json --out filled.pdf
```

`data.json` maps field name → string value:

```json
{
  "applicant_name": "Wei E.",
  "submission_date": "2026-05-06",
  "agreed": "Yes"
}
```

The script discovers fields via `pypdf.PdfReader.get_fields()` and updates
them with `update_page_form_field_values()`. Fields not present in the JSON
are left untouched. Run with `--list-fields` to enumerate the form's fields
without filling.

Caveats:

- `/Btn` checkbox fields take the export value (often `Yes`, `On`, or `1`)
  rather than `true` — inspect with `--list-fields` to discover.
- AcroForm fills only. XFA forms (used by some legal templates) require
  Adobe-specific tooling and are out of scope.
- Some signed PDFs invalidate the signature when fields change. Strip
  signatures explicitly with `--clear-signatures` if that is intended.

---

## Path D: Generate from scratch

Use `reportlab` directly when you need a new PDF:

```python
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import LETTER
from pathlib import Path

c = canvas.Canvas(str(Path("out.pdf")), pagesize=LETTER)
c.setFont("Helvetica-Bold", 18)
c.drawString(72, 720, "Q3 Review")
c.setFont("Helvetica", 11)
c.drawString(72, 696, "Revenue grew 18% year over year.")
c.showPage()
c.save()
```

For tables, headers/footers, and multi-column layouts, switch to
`reportlab.platypus` (`SimpleDocTemplate`, `Paragraph`, `Table`,
`PageBreak`). See [references/reportlab.md](references/reportlab.md).

---

## Boundary with `nano-pdf`

`nano-pdf` (sibling bundled skill) wraps an LLM that takes a page index and
a natural-language instruction. Use it when the change is "fix the typo on
page 1" or "make the title shorter". Use **this** skill when the change is
"merge these three PDFs", "extract the tables", or "fill the form". The two
do not overlap: if you find yourself reaching for `nano-pdf` to do a merge,
switch to `pdf-toolkit`; if you reach here to "rewrite page 5 to be friendlier",
switch back.

---

## Common pitfalls

| Symptom | Cause | Fix |
|---|---|---|
| Extracted text is empty | Scanned PDF, no text layer | OCR is out of scope; use a separate OCR skill |
| Garbled characters in extract | PDF uses a custom font encoding | Try `pdfplumber.open(path, laparams={...})` with `char_margin` adjustments |
| Merged PDF is huge | Underlying PDFs include large embedded fonts | Subset fonts via `pypdf` `compress_content_streams()` |
| Form fill silently no-ops | Field name in JSON does not match PDF field name | Run with `--list-fields` first to see exact names |
| Pages out of order after split | Range overlap collapsed unexpectedly | Use disjoint ranges, e.g. `1-3,4-6` not `1-5,3-6` |

---

## Boundaries

- This skill works with text-based and form-based PDFs. Scanned image PDFs
  need OCR before any text path produces results.
- Encrypted PDFs are read-only here. Decryption requires the user-supplied
  password and is out of scope for this skill.
- For PDF-to-image rendering, use a separate skill that wraps Poppler or
  PyMuPDF.
- Digital signature operations (signing, verifying, revoking) are out of
  scope.
advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

ai-video-scriptSkill

Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'