pdf-manipulation
Manipulate PDF files including merge, split, extract, redact, convert, and secure workflows.
git clone --depth 1 https://github.com/besoeasy/open-skills /tmp/pdf-manipulation && cp -r /tmp/pdf-manipulation/skills/pdf-manipulation ~/.claude/skills/pdf-manipulationSKILL.md
# PDF Manipulation Skill
Merge, split, extract, redact, and transform PDF files using free command-line tools and libraries. Covers common PDF operations for document automation workflows.
## When to use
- Merge multiple PDFs into one document
- Split large PDFs into separate files or page ranges
- Extract text, images, or specific pages
- Redact sensitive information
- Add watermarks, passwords, or metadata
- Convert PDFs to images or other formats
## Required tools
- **pdftk** — Swiss Army knife for PDF manipulation (merge, split, rotate, encrypt)
- **qpdf** — PDF transformation and encryption (linearize, decrypt, repair)
- **pdftotext / pdfimages** — Part of poppler-utils (extract text and images)
- **ghostscript (gs)** — Advanced PDF processing, compression, and conversion
### Installation
```bash
# Ubuntu/Debian
sudo apt-get install pdftk qpdf poppler-utils ghostscript
# macOS (Homebrew)
brew install pdftk-java qpdf poppler ghostscript
# For Node.js: npm i pdf-lib (pure JS, no system deps)
# For Python: pip install PyPDF2 pypdf
```
## Skills
### Merge PDFs
```bash
# Using pdftk (preserves bookmarks, forms)
pdftk file1.pdf file2.pdf file3.pdf cat output merged.pdf
# Using ghostscript (better compression)
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf file1.pdf file2.pdf file3.pdf
# Using qpdf (preserves structure)
qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- merged.pdf
```
**Node.js (pdf-lib):**
```javascript
const { PDFDocument } = require('pdf-lib');
const fs = require('fs');
async function mergePDFs(files, output) {
const mergedPdf = await PDFDocument.create();
for (const file of files) {
const pdfBytes = fs.readFileSync(file);
const pdf = await PDFDocument.load(pdfBytes);
const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
pages.forEach(page => mergedPdf.addPage(page));
}
const mergedBytes = await mergedPdf.save();
fs.writeFileSync(output, mergedBytes);
}
// mergePDFs(['file1.pdf', 'file2.pdf'], 'merged.pdf');
```
### Split PDF (by page or range)
```bash
# Split every page into separate files
pdftk input.pdf burst output page_%02d.pdf
# Extract specific pages (e.g., pages 1-5 and 10)
pdftk input.pdf cat 1-5 10 output subset.pdf
# Extract page ranges with qpdf
qpdf input.pdf --pages . 1-5 -- output.pdf
# Split every N pages (e.g., every 2 pages)
pdftk input.pdf burst
# then manually combine or script it
```
**Node.js (pdf-lib):**
```javascript
const { PDFDocument } = require('pdf-lib');
const fs = require('fs');
async function extractPages(inputPath, pages, outputPath) {
const pdfBytes = fs.readFileSync(inputPath);
const pdfDoc = await PDFDocument.load(pdfBytes);
const newPdf = await PDFDocument.create();
for (const pageNum of pages) {
const [page] = await newPdf.copyPages(pdfDoc, [pageNum - 1]);
newPdf.addPage(page);
}
const newBytes = await newPdf.save();
fs.writeFileSync(outputPath, newBytes);
}
// extractPages('input.pdf', [1, 3, 5], 'output.pdf');
```
### Extract text
```bash
# Extract all text (preserves layout)
pdftotext input.pdf output.txt
# Extract text as raw (no layout)
pdftotext -raw input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt
# Using qpdf + pdftotext
pdftotext -layout input.pdf -
```
**Node.js (pdf-parse):**
```javascript
const fs = require('fs');
const pdf = require('pdf-parse');
async function extractText(filePath) {
const dataBuffer = fs.readFileSync(filePath);
const data = await pdf(dataBuffer);
return data.text;
}
// extractText('input.pdf').then(console.log);
```
### Extract images
```bash
# Extract all images from PDF
pdfimages -all input.pdf output_prefix
# Output: output_prefix-000.png, output_prefix-001.jpg, etc.
# Extract only JPEGs
pdfimages -j input.pdf output_prefix
```
### Redact / Remove pages
```bash
# Remove specific pages (e.g., remove pages 2-4)
pdftk input.pdf cat 1 5-end output redacted.pdf
# Keep only specific pages
pdftk input.pdf cat 1-10 20-30 output selected.pdf
```
### Add password protection
```bash
# Encrypt PDF with password
pdftk input.pdf output secured.pdf user_pw mypassword
# Remove password
pdftk secured.pdf input_pw mypassword output unlocked.pdf
# Using qpdf (AES-256)
qpdf --encrypt userpass ownerpass 256 -- input.pdf output.pdf
```
**Node.js (pdf-lib):**
```javascript
const { PDFDocument } = require('pdf-lib');
const fs = require('fs');
async function encryptPDF(inputPath, password, outputPath) {
const pdfBytes = fs.readFileSync(inputPath);
const pdfDoc = await PDFDocument.load(pdfBytes);
const encryptedBytes = await pdfDoc.save({
userPassword: password,
ownerPassword: password
});
fs.writeFileSync(outputPath, encryptedBytes);
}
```
### Rotate pages
```bash
# Rotate all pages 90 degrees clockwise
pdftk input.pdf cat 1-endright output rotated.pdf
# Rotate specific pages
pdftk input.pdf cat 1-5 6right 7-end output rotated.pdf
# Options: right (90°), left (270°), down (180°)
```
### Compress / Reduce file size
```bash
# Using ghostscript (adjust quality)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf input.pdf
# Quality settings:
# /screen - low quality (72 dpi)
# /ebook - medium (150 dpi)
# /printer - high (300 dpi)
# /prepress - highest (300 dpi, preserves color)
# Using qpdf (lossless compression)
qpdf --linearize --object-streams=generate input.pdf compressed.pdf
```
### Convert PDF to images
```bash
# Convert each page to PNG (300 DPI)
pdftoppm -png -r 300 input.pdf output_prefix
# Output: output_prefix-1.png, output_prefix-2.png, etc.
# Convert to JPEG
pdftoppm -jpeg -r 150 input.pdf output_prefix
# Using ImageMagick (alternative)
convert -density 300 input.pdf output_%03d.png
```
### Add watermark
```bash
# Overlay watermark.pdf on every page
pdftk input.pdf stamp watermark.pdf output watermarked.pdf
# BackgrouEncrypt and decrypt files or streams using age — a simple, modern, and secure encryption tool with small explicit keys, passphrase support, SSH key support, post-quantum hybrid keys, and UNIX-style composability. No config options, no footguns.
Upload and host files anonymously using decentralized storage with Originless and IPFS.
Automate web browsers for AI agents using agent-browser CLI with deterministic element selection.
Star all repositories from a GitHub user automatically. Use when: (1) Supporting open source creators, (2) Bulk discovery of useful projects, or (3) Automating GitHub engagement.
Automatically creates user-facing changelogs from git commits by analyzing commit history, categorizing changes, and transforming technical commits into clear, customer-friendly release notes. Turns hours of manual changelog writing into minutes of automated generation.
Log all chat messages to a SQLite database for searchable history and audit. Use when: (1) Building chat history, (2) Auditing conversations, (3) Searching past messages, or (4) User asks to log chats.
Check cryptocurrency wallet balances across multiple blockchains using free public APIs.
Calculate line-of-sight and road distances between two cities using free OpenStreetMap services.