media-processing
This Claude Code skill provides access to FFmpeg and ImageMagick command-line tools for comprehensive media processing tasks including video encoding with modern codecs, audio extraction, image manipulation, and streaming manifest creation. Use it when converting between media formats, resizing or applying effects to images, transcoding video files, generating thumbnails, batch processing multiple files, creating HLS or DASH streaming outputs, or optimizing media file sizes and quality across 100+ supported formats with optional hardware acceleration.
git clone --depth 1 https://github.com/mrgoonie/claudekit-skills /tmp/media-processing && cp -r /tmp/media-processing/.claude/skills/media-processing ~/.claude/skills/media-processingSKILL.md
# Media Processing Skill
Process video, audio, and images using FFmpeg and ImageMagick command-line tools for conversion, optimization, streaming, and manipulation tasks.
## When to Use This Skill
Use when:
- Converting media formats (video, audio, images)
- Encoding video with codecs (H.264, H.265, VP9, AV1)
- Processing images (resize, crop, effects, watermarks)
- Extracting audio from video
- Creating streaming manifests (HLS/DASH)
- Generating thumbnails and previews
- Batch processing media files
- Optimizing file sizes and quality
- Applying filters and effects
- Creating composite images or videos
## Tool Selection Guide
### FFmpeg: Video/Audio Processing
Use FFmpeg for:
- Video encoding, conversion, transcoding
- Audio extraction, conversion, mixing
- Live streaming (RTMP, HLS, DASH)
- Video filters (scale, crop, rotate, overlay)
- Hardware-accelerated encoding
- Media file inspection (ffprobe)
- Frame extraction, concatenation
- Codec selection and optimization
### ImageMagick: Image Processing
Use ImageMagick for:
- Image format conversion (PNG, JPEG, WebP, GIF)
- Resizing, cropping, transformations
- Batch image processing (mogrify)
- Visual effects (blur, sharpen, sepia)
- Text overlays and watermarks
- Image composition and montages
- Color adjustments, filters
- Thumbnail generation
### Decision Matrix
| Task | Tool | Why |
|------|------|-----|
| Video encoding | FFmpeg | Native video codec support |
| Audio extraction | FFmpeg | Direct stream manipulation |
| Image resize | ImageMagick | Optimized for still images |
| Batch images | ImageMagick | mogrify for in-place edits |
| Video thumbnails | FFmpeg | Frame extraction built-in |
| GIF creation | FFmpeg or ImageMagick | FFmpeg for video source, ImageMagick for images |
| Streaming | FFmpeg | Live streaming protocols |
| Image effects | ImageMagick | Rich filter library |
## Installation
### macOS
```bash
brew install ffmpeg imagemagick
```
### Ubuntu/Debian
```bash
sudo apt-get install ffmpeg imagemagick
```
### Windows
```bash
# Using winget
winget install ffmpeg
winget install ImageMagick.ImageMagick
# Or download binaries
# FFmpeg: https://ffmpeg.org/download.html
# ImageMagick: https://imagemagick.org/script/download.php
```
### Verify Installation
```bash
ffmpeg -version
ffprobe -version
magick -version
# or
convert -version
```
## Quick Start Examples
### Video Conversion
```bash
# Convert format (copy streams, fast)
ffmpeg -i input.mkv -c copy output.mp4
# Re-encode with H.264
ffmpeg -i input.avi -c:v libx264 -crf 22 -c:a aac output.mp4
# Resize video to 720p
ffmpeg -i input.mp4 -vf scale=-1:720 -c:a copy output.mp4
```
### Audio Extraction
```bash
# Extract audio (no re-encoding)
ffmpeg -i video.mp4 -vn -c:a copy audio.m4a
# Convert to MP3
ffmpeg -i video.mp4 -vn -q:a 0 audio.mp3
```
### Image Processing
```bash
# Convert format
magick input.png output.jpg
# Resize maintaining aspect ratio
magick input.jpg -resize 800x600 output.jpg
# Create square thumbnail
magick input.jpg -resize 200x200^ -gravity center -extent 200x200 thumb.jpg
```
### Batch Image Resize
```bash
# Resize all JPEGs to 800px width
mogrify -resize 800x -quality 85 *.jpg
# Output to separate directory
mogrify -path ./output -resize 800x600 *.jpg
```
### Video Thumbnail
```bash
# Extract frame at 5 seconds
ffmpeg -ss 00:00:05 -i video.mp4 -vframes 1 -vf scale=320:-1 thumb.jpg
```
### HLS Streaming
```bash
# Generate HLS playlist
ffmpeg -i input.mp4 \
-c:v libx264 -preset fast -crf 22 -g 48 \
-c:a aac -b:a 128k \
-f hls -hls_time 6 -hls_playlist_type vod \
playlist.m3u8
```
### Image Watermark
```bash
# Add watermark to corner
magick input.jpg watermark.png -gravity southeast \
-geometry +10+10 -composite output.jpg
```
## Common Workflows
### Optimize Video for Web
```bash
# H.264 with good compression
ffmpeg -i input.mp4 \
-c:v libx264 -preset slow -crf 23 \
-c:a aac -b:a 128k \
-movflags +faststart \
output.mp4
```
### Create Responsive Images
```bash
# Generate multiple sizes
for size in 320 640 1024 1920; do
magick input.jpg -resize ${size}x -quality 85 "output-${size}w.jpg"
done
```
### Extract Video Segment
```bash
# From 1:30 to 3:00 (re-encode for precision)
ffmpeg -i input.mp4 -ss 00:01:30 -to 00:03:00 \
-c:v libx264 -c:a aac output.mp4
```
### Batch Image Optimization
```bash
# Convert PNG to optimized JPEG
mogrify -path ./optimized -format jpg -quality 85 -strip *.png
```
### Video GIF Creation
```bash
# High quality GIF with palette
ffmpeg -i input.mp4 -vf "fps=15,scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" output.gif
```
### Image Blur Effect
```bash
# Gaussian blur
magick input.jpg -gaussian-blur 0x8 output.jpg
```
## Advanced Techniques
### Multi-Pass Video Encoding
```bash
# Pass 1 (analysis)
ffmpeg -y -i input.mkv -c:v libx264 -b:v 2600k -pass 1 -an -f null /dev/null
# Pass 2 (encoding)
ffmpeg -i input.mkv -c:v libx264 -b:v 2600k -pass 2 -c:a aac output.mp4
```
### Hardware-Accelerated Encoding
```bash
# NVIDIA NVENC
ffmpeg -hwaccel cuda -i input.mp4 -c:v h264_nvenc -preset fast -crf 22 output.mp4
# Intel QuickSync
ffmpeg -hwaccel qsv -c:v h264_qsv -i input.mp4 -c:v h264_qsv output.mp4
```
### Complex Image Pipeline
```bash
# Resize, crop, border, adjust
magick input.jpg \
-resize 1000x1000^ \
-gravity center \
-crop 1000x1000+0+0 +repage \
-bordercolor black -border 5x5 \
-brightness-contrast 5x10 \
-quality 90 \
output.jpg
```
### Video Filter Chains
```bash
# Scale, denoise, watermark
ffmpeg -i video.mp4 -i logo.png \
-filter_complex "[0:v]scale=1280:720,hqdn3d[v];[v][1:v]overlay=10:10" \
-c:a copy output.mp4
```
### Animated GIF from Images
```bash
# Create with delay
magick -delay 100 -loop 0 frame*.png animated.gif
# Optimize size
magick animated.gif -fuzz 5% -layers Optimize optimized.gif
```
## Media Analysis
### Inspect Video Properties
```bash
# Detailed JSON output
ffprobe -v quiet -print_format jsoManage MCP (Model Context Protocol) server integrations - discover tools/prompts/resources, analyze relevance for tasks, and execute MCP capabilities. Use when need to work with MCP servers, discover available MCP tools, filter MCP capabilities for specific tasks, execute MCP tools programmatically, or implement MCP client functionality. Keeps main context clean by handling MCP discovery in subagent context.
Stage all files and create a commit.
Stage, commit and push all code in the current branch
Create a pull request
Create a new agent skill
Utilize tools of Model Context Protocol (MCP) servers
Create aesthetically beautiful interfaces following proven design principles. Use when building UI/UX, analyzing designs from inspiration sites, generating design images with ai-multimodal, implementing visual hierarchy and color theory, adding micro-interactions, or creating design documentation. Includes workflows for capturing and analyzing inspiration screenshots with chrome-devtools and ai-multimodal, iterative design image generation until aesthetic standards are met, and comprehensive design system guidance covering BEAUTIFUL (aesthetic principles), RIGHT (functionality/accessibility), SATISFYING (micro-interactions), and PEAK (storytelling) stages. Integrates with chrome-devtools, ai-multimodal, media-processing, ui-styling, and web-frameworks skills.
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.