Skill1.1k repo starsupdated 10d ago

video-clipper

Video Clipper transforms long-form video content like podcasts and interviews into vertical short-form clips optimized for Instagram Reels, TikTok, and YouTube Shorts. The skill automatically transcribes audio, identifies compelling moments, extracts clips with speaker-tracked reframing from landscape to portrait orientation, and applies animated captions. Use this when repurposing talking-head content into multiple viral-ready clips with minimal manual editing.

View source Repository: goose-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/gooseworks-ai/goose-skills /tmp/video-clipper && cp -r /tmp/video-clipper/skills/design/packs/video-production/video-clipper ~/.claude/skills/video-clipper

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Video Clipper

Takes a long-form video and produces ready-to-post short-form vertical clips with speaker-tracked framing and professional animated captions. Works with podcasts, interviews, talks, and any talking-head content.

---

## Requirements

- **FFmpeg** installed and available in PATH (`brew install ffmpeg` on macOS, `apt install ffmpeg` on Linux)
- **Python 3** with `openai-whisper` and `requests` packages (`pip install openai-whisper requests`). **Note:** `openai-whisper` installs PyTorch (~2GB download). This skill uses `openai-whisper` instead of the lighter `whisper-cpp` because it provides word-level timestamps needed for accurate viral moment scoring.
- **yt-dlp** installed (for YouTube/URL downloads) — `brew install yt-dlp` on macOS, `pip install yt-dlp` on Linux
- **API Keys** in `.env` file (project root or any parent directory):
  - `KLAP_API_KEY` — from [klap.app](https://klap.app) (reframing with speaker tracking)
  - `CAPTIONS_AI_API_KEY` — from [captions.ai](https://captions.ai) / [platform.mirage.app](https://platform.mirage.app) (animated captions)

**Before starting:** Verify that FFmpeg, yt-dlp, and the Python packages are installed. If any are missing, instruct the user to install them before proceeding.

### Cost Per Clip

| Step | Cost |
|---|---|
| Whisper (transcription) | Free (local) |
| FFmpeg (clip extraction) | Free (local) |
| Klap (reframing) | ~$1.50-2.50/clip depending on plan |
| Captions.ai (captions) | ~$0.15/min of output |
| **Total per clip** | **~$2-3** |

---

## Input

The user provides:

1. **Video source** (required) — one of:
   - **Local file path** — e.g. `/path/to/podcast.mp4`
   - **YouTube URL** — e.g. `https://www.youtube.com/watch?v=...`
   - **Any public video URL** — direct link to MP4

2. **Moment selection mode** (ask the user):
   - **Automatic** — Claude picks the best moments
   - **Manual** — user provides specific timestamps
   - **Hybrid** — Claude proposes moments, user approves/adjusts before processing

3. **Number of clips** (optional) — default 3-5. Depends on video length and content density.

4. **Caption template** (optional) — Captions.ai template ID. Default: `ctpl_DxflLOnuKkb198FNdI9E` (Heat). List available templates via the API if user wants to browse.

5. **Target clip duration** (optional) — default 15-60 seconds. User can specify a range.

---

## Pipeline

### Step 1: Get the Video

Based on input type:

**Local file:**
```bash
# Verify it exists and get duration
ffprobe -v quiet -print_format json -show_format "video.mp4"
```

**YouTube URL:**
```bash
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" --merge-output-format mp4 -o "<workdir>/source.mp4" "<URL>"
```

**Other URL:**
```bash
curl -L -o "<workdir>/source.mp4" "<URL>"
```

### Step 2: Transcribe with Whisper

```python
import whisper

model = whisper.load_model("base")
result = model.transcribe("source.mp4", language="en", word_timestamps=True)
```

Save both:
- `transcript.json` — full result with word-level timestamps (needed for Step 3)
- `transcript.txt` — readable version with timestamps per segment (for Claude to analyze)

### Step 3: Identify Best Moments (Viral Scoring)

This is the key intelligence step. Claude reads the full transcript and identifies potential clip moments.

**Step 3a: Segment the transcript into candidate moments**

Scan the transcript for self-contained 15-60 second windows. Look for natural start/end points (topic changes, pauses, complete thoughts).

**Step 3b: Score each candidate moment on this rubric**

For each candidate, score 1-10 on these five criteria:

| Criteria | What to look for | Score guide |
|---|---|---|
| **Hook Strength** | Does the first sentence grab attention? Is it a surprising claim, provocative question, or bold statement? | 10 = "wait, what?" reaction. 1 = generic setup |
| **Quotability** | Contains a memorable one-liner that people would screenshot or share? | 10 = tweet-worthy standalone quote. 1 = no standalone phrases |
| **Emotional Intensity** | Does the speaker show passion, humor, anger, vulnerability, or conviction? | 10 = genuine emotion. 1 = monotone/flat delivery |
| **Self-Containedness** | Does it make complete sense without watching the rest of the video? | 10 = fully standalone. 1 = needs prior context |
| **Surprise/Controversy** | Does it challenge conventional wisdom, reveal something unexpected, or take a hot take? | 10 = counterintuitive insight. 1 = commonly known information |

**Total score = sum of all five (max 50).**

**Step 3c: Rank and select top N moments**

- Sort by total score descending
- Select top N (user-specified or default 3-5)
- Ensure selected moments don't overlap
- Prefer variety in topics/angles — don't pick 3 clips about the same point

**Step 3d: Present to user for approval**

For each selected moment, show:
- Timestamp range (start - end)
- Duration
- Transcript excerpt (first 2-3 lines)
- Score breakdown (hook/quotability/emotion/self-contained/surprise)
- Total score
- Suggested hook text for the clip

**Wait for user approval.** User can:
- Approve all
- Remove specific clips
- Add their own timestamps
- Adjust start/end times
- Request more options

**Do NOT proceed to Step 4 until user approves.**

### Step 4: Extract Raw Clips

For each approved moment, extract with FFmpeg:

```bash
ffmpeg -y -ss <start> -to <end> -i source.mp4 -c copy clip<N>-raw.mp4
```

### Step 5: Reframe with Klap

Upload each raw clip to Klap for AI-powered speaker-tracked reframing to 9:16.

**API: Klap**
- Endpoint: `POST https://api.klap.app/v2/tasks/video-to-video`
- Auth: `Authorization: Bearer <KLAP_API_KEY>`

**Submit each clip:**

```python
import requests

headers = {
    "Authorization": f"Bearer {klap_key}",
}

# Direct file upload
with open("clip-raw.mp4", "rb") as f:
    r = requests.post(
        "https://api.klap.app/v2/tasks/video-to-video",
        headers=headers,
        files={"video": f},
        data={
            "langua