video-clipper
Video Clipper transforms long-form video content like podcasts and interviews into vertical short-form clips optimized for Instagram Reels, TikTok, and YouTube Shorts. The skill automatically transcribes audio, identifies compelling moments, extracts clips with speaker-tracked reframing from landscape to portrait orientation, and applies animated captions. Use this when repurposing talking-head content into multiple viral-ready clips with minimal manual editing.
git clone --depth 1 https://github.com/gooseworks-ai/goose-skills /tmp/video-clipper && cp -r /tmp/video-clipper/skills/design/packs/video-production/video-clipper ~/.claude/skills/video-clipperSKILL.md
# Video Clipper
Takes a long-form video and produces ready-to-post short-form vertical clips with speaker-tracked framing and professional animated captions. Works with podcasts, interviews, talks, and any talking-head content.
---
## Requirements
- **FFmpeg** installed and available in PATH (`brew install ffmpeg` on macOS, `apt install ffmpeg` on Linux)
- **Python 3** with `openai-whisper` and `requests` packages (`pip install openai-whisper requests`). **Note:** `openai-whisper` installs PyTorch (~2GB download). This skill uses `openai-whisper` instead of the lighter `whisper-cpp` because it provides word-level timestamps needed for accurate viral moment scoring.
- **yt-dlp** installed (for YouTube/URL downloads) — `brew install yt-dlp` on macOS, `pip install yt-dlp` on Linux
- **API Keys** in `.env` file (project root or any parent directory):
- `KLAP_API_KEY` — from [klap.app](https://klap.app) (reframing with speaker tracking)
- `CAPTIONS_AI_API_KEY` — from [captions.ai](https://captions.ai) / [platform.mirage.app](https://platform.mirage.app) (animated captions)
**Before starting:** Verify that FFmpeg, yt-dlp, and the Python packages are installed. If any are missing, instruct the user to install them before proceeding.
### Cost Per Clip
| Step | Cost |
|---|---|
| Whisper (transcription) | Free (local) |
| FFmpeg (clip extraction) | Free (local) |
| Klap (reframing) | ~$1.50-2.50/clip depending on plan |
| Captions.ai (captions) | ~$0.15/min of output |
| **Total per clip** | **~$2-3** |
---
## Input
The user provides:
1. **Video source** (required) — one of:
- **Local file path** — e.g. `/path/to/podcast.mp4`
- **YouTube URL** — e.g. `https://www.youtube.com/watch?v=...`
- **Any public video URL** — direct link to MP4
2. **Moment selection mode** (ask the user):
- **Automatic** — Claude picks the best moments
- **Manual** — user provides specific timestamps
- **Hybrid** — Claude proposes moments, user approves/adjusts before processing
3. **Number of clips** (optional) — default 3-5. Depends on video length and content density.
4. **Caption template** (optional) — Captions.ai template ID. Default: `ctpl_DxflLOnuKkb198FNdI9E` (Heat). List available templates via the API if user wants to browse.
5. **Target clip duration** (optional) — default 15-60 seconds. User can specify a range.
---
## Pipeline
### Step 1: Get the Video
Based on input type:
**Local file:**
```bash
# Verify it exists and get duration
ffprobe -v quiet -print_format json -show_format "video.mp4"
```
**YouTube URL:**
```bash
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" --merge-output-format mp4 -o "<workdir>/source.mp4" "<URL>"
```
**Other URL:**
```bash
curl -L -o "<workdir>/source.mp4" "<URL>"
```
### Step 2: Transcribe with Whisper
```python
import whisper
model = whisper.load_model("base")
result = model.transcribe("source.mp4", language="en", word_timestamps=True)
```
Save both:
- `transcript.json` — full result with word-level timestamps (needed for Step 3)
- `transcript.txt` — readable version with timestamps per segment (for Claude to analyze)
### Step 3: Identify Best Moments (Viral Scoring)
This is the key intelligence step. Claude reads the full transcript and identifies potential clip moments.
**Step 3a: Segment the transcript into candidate moments**
Scan the transcript for self-contained 15-60 second windows. Look for natural start/end points (topic changes, pauses, complete thoughts).
**Step 3b: Score each candidate moment on this rubric**
For each candidate, score 1-10 on these five criteria:
| Criteria | What to look for | Score guide |
|---|---|---|
| **Hook Strength** | Does the first sentence grab attention? Is it a surprising claim, provocative question, or bold statement? | 10 = "wait, what?" reaction. 1 = generic setup |
| **Quotability** | Contains a memorable one-liner that people would screenshot or share? | 10 = tweet-worthy standalone quote. 1 = no standalone phrases |
| **Emotional Intensity** | Does the speaker show passion, humor, anger, vulnerability, or conviction? | 10 = genuine emotion. 1 = monotone/flat delivery |
| **Self-Containedness** | Does it make complete sense without watching the rest of the video? | 10 = fully standalone. 1 = needs prior context |
| **Surprise/Controversy** | Does it challenge conventional wisdom, reveal something unexpected, or take a hot take? | 10 = counterintuitive insight. 1 = commonly known information |
**Total score = sum of all five (max 50).**
**Step 3c: Rank and select top N moments**
- Sort by total score descending
- Select top N (user-specified or default 3-5)
- Ensure selected moments don't overlap
- Prefer variety in topics/angles — don't pick 3 clips about the same point
**Step 3d: Present to user for approval**
For each selected moment, show:
- Timestamp range (start - end)
- Duration
- Transcript excerpt (first 2-3 lines)
- Score breakdown (hook/quotability/emotion/self-contained/surprise)
- Total score
- Suggested hook text for the clip
**Wait for user approval.** User can:
- Approve all
- Remove specific clips
- Add their own timestamps
- Adjust start/end times
- Request more options
**Do NOT proceed to Step 4 until user approves.**
### Step 4: Extract Raw Clips
For each approved moment, extract with FFmpeg:
```bash
ffmpeg -y -ss <start> -to <end> -i source.mp4 -c copy clip<N>-raw.mp4
```
### Step 5: Reframe with Klap
Upload each raw clip to Klap for AI-powered speaker-tracked reframing to 9:16.
**API: Klap**
- Endpoint: `POST https://api.klap.app/v2/tasks/video-to-video`
- Auth: `Authorization: Bearer <KLAP_API_KEY>`
**Submit each clip:**
```python
import requests
headers = {
"Authorization": f"Bearer {klap_key}",
}
# Direct file upload
with open("clip-raw.mp4", "rb") as f:
r = requests.post(
"https://api.klap.app/v2/tasks/video-to-video",
headers=headers,
files={"video": f},
data={
"langua>
AI video conversations - create real-time video calls with AI personas
AI-powered web scraping - extract data using natural language prompts
Search Amazon products - find items, compare prices, read reviews
Test and document API endpoints - validate responses, check status, generate examples
>
>
Brand intelligence - logos, colors, fonts, styleguides, and company data from any domain