Skip to main content
ClaudeWave
Skill654 estrellas del repoactualizado today

media-processing

This Claude Code skill processes video, audio, and image files through a configurable pipeline that ingests media, extracts keyframes, analyzes content with Claude and Gemini models, and enables querying or clipping based on intelligent analysis. Use it when you need to extract structured insights from video content, perform content-aware segmentation, or enable natural language Q&A over media assets.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/vellum-ai/vellum-assistant /tmp/media-processing && cp -r /tmp/media-processing/assistant/src/config/bundled-skills/media-processing ~/.claude/skills/media-processing
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

Ingest and track processing of media files (video, audio, images) through a configurable 3-phase pipeline.

## End-to-End Workflow

The processing pipeline follows a sequential 3-phase flow:

1. **Ingest** (`ingest_media`) - Register a media file, detect MIME type, extract duration, deduplicate by content hash.
2. **Preprocess** (`extract_keyframes`) - Detect dead time, segment the video into windows, extract downscaled keyframes, build a subject registry, and write a pipeline manifest.
3. **Map** (`analyze_keyframes`) - Send each segment's frames to Gemini 2.5 Flash with assistant-provided extraction instructions and a JSON Schema for guaranteed structured output. Supports concurrency pooling, cost tracking, resumability, and automatic retries.
4. **Reduce / Query** (`query_media`) - Send all map output to Claude for intelligent analysis and Q&A. Supports arbitrary natural language queries about video content.
5. **Clip** (`generate_clip`) - Extract video clips around specific moments.

The processing pipeline service (`services/processing-pipeline.ts`) orchestrates phases 2-4 automatically with retries, resumability, and cancellation support.

## Tools

### ingest_media

Register a media file for processing. Accepts an absolute file path, validates the file exists, detects MIME type, extracts duration (for video/audio via ffprobe), and registers the asset with content-hash deduplication.

### media_status

Query the processing status of a media asset. Returns the asset metadata along with per-stage progress details. Use this to monitor pipeline progress.

### extract_keyframes

Preprocess a video asset: detect dead time via mpdecimate, segment the video into windows, extract downscaled keyframes at regular intervals, build a subject registry, and write a pipeline manifest.

Parameters:

- `asset_id` (required) - ID of the media asset.
- `interval_seconds` - Interval between keyframes (default: 1s). Use 0.5s for sports/action content where frame density matters.
- `segment_duration` - Duration of each segment window (default: 15s).
- `dead_time_threshold` - Sensitivity for dead-time detection (default: 0.02).
- `section_config` - Path to a JSON file with manual section boundaries.
- `detect_dead_time` - Whether to detect and skip dead time (default: false). Dead-time detection can be too aggressive for continuous action video like sports - it may incorrectly skip live play. Enable only for content with clear idle periods (e.g., lectures, surveillance footage).
- `short_edge` - Short edge resolution for downscaled frames in pixels (default: 480).
- `include_audio` - Whether to extract and transcribe audio for each segment (default: false). When enabled, each segment's audio is transcribed using the configured STT service and stored alongside visual frames.

### analyze_keyframes

Map video segments through Gemini's structured output API. Supports two modes:

- **`keyframes`** (default) - Reads frames from the preprocess manifest, sends each segment's images to Gemini. Requires `extract_keyframes` to be run first. Best for longer videos (> 1 hour) or when you need fine-grained control over frame selection (interval, segment duration, dead-time skipping).
- **`direct_video`** - Uploads the video file directly to Gemini's Files API. Gemini sees actual motion and temporal context instead of static frames. Best for shorter videos (< 1 hour) where temporal context matters (detecting actions, transitions, motion patterns). Has a 2 GB file size limit. Does not require `extract_keyframes` preprocessing.

Both modes produce the same `MapOutput` format, so `query_media` works identically regardless of which mode was used.

Parameters:

- `asset_id` (required) - ID of the media asset.
- `system_prompt` (required) - Extraction instructions for Gemini.
- `output_schema` (required) - JSON Schema for structured output.
- `mode` - Analysis mode: `'keyframes'` (default) or `'direct_video'`.
- `context` - Additional context to include in the prompt.
- `model` - Gemini model to use (default: `gemini-2.5-flash`).
- `concurrency` - Maximum concurrent API requests (default: 10, keyframes mode only).
- `max_retries` - Retry attempts per segment on failure (default: 3).

### query_media

Query video analysis data using natural language. Sends map output (from analyze_keyframes) to Claude for intelligent analysis and Q&A. Supports arbitrary questions about video content.

Parameters:

- `asset_id` (required) - ID of the media asset.
- `query` (required) - Natural language query about the video data.
- `system_prompt` - Optional system prompt for Claude.
- `model` - LLM model to use (default: `claude-sonnet-4-6`).

### generate_clip

Extract a video clip from a media asset using ffmpeg. Applies configurable pre/post-roll padding (clamped to file boundaries), outputs the clip as a temporary file.

## Services

### Processing Pipeline (services/processing-pipeline.ts)

Orchestrates the full processing pipeline with reliability features:

- **Sequential execution**: preprocess, map, reduce.
- **Retries**: Each stage is retried with exponential backoff and jitter (configurable max retries and base delay).
- **Resumability**: Checks processing_stages to find the last completed stage and resumes from there. Safe to restart after crashes.
- **Cancellation**: Cooperative cancellation via asset status. Set asset status to `cancelled` and the pipeline stops between stages.
- **Idempotency**: Re-ingesting the same file hash is a no-op. Re-running a fully completed pipeline is also a no-op.
- **Graceful degradation**: If a stage fails mid-batch (e.g., Gemini API errors), partial results are saved. The stage is marked as failed with the error details, and the pipeline stops without losing work.

### Preprocess (services/preprocess.ts)

Handles dead-time detection, video segmentation, keyframe extraction, and subject registry building. Writes a pipeline manifest consumed by the Map phase.

### Gemini Map (services/gemini-map.t