clip-hand-skill
This skill provides expert knowledge for building video clipping pipelines, covering yt-dlp for downloading and subtitle retrieval, Whisper for speech-to-text transcription, SRT file generation, and ffmpeg for video processing with embedded subtitles. Use it when implementing cross-platform video workflows that require downloading source material, generating transcripts, and producing clipped video segments with synchronized captions.
git clone --depth 1 https://github.com/RightNow-AI/openfang /tmp/clip-hand-skill && cp -r /tmp/clip-hand-skill/crates/openfang-hands/bundled/clip ~/.claude/skills/clip-hand-skillSKILL.md
# Video Clipping Expert Knowledge
## Cross-Platform Notes
All tools (ffmpeg, ffprobe, yt-dlp, whisper) use **identical CLI flags** on Windows, macOS, and Linux. The differences are only in shell syntax:
| Feature | macOS / Linux | Windows (cmd.exe) |
|---------|---------------|-------------------|
| Suppress stderr | `2>/dev/null` | `2>NUL` |
| Filter output | `\| grep pattern` | `\| findstr pattern` |
| Delete files | `rm file1 file2` | `del file1 file2` |
| Null output device | `-f null -` | `-f null -` (same) |
| ffmpeg subtitle paths | `subtitles=clip.srt` | `subtitles=clip.srt` (relative OK, absolute needs `C\\:/path`) |
IMPORTANT: ffmpeg filter paths (`-vf "subtitles=..."`) always need forward slashes. On Windows with absolute paths, escape the colon: `subtitles=C\\:/Users/me/clip.srt`
Prefer using `file_write` tool for creating SRT/text files instead of shell echo/heredoc.
---
## yt-dlp Reference
### Download with Format Selection
```
# Best video up to 1080p + best audio, merged
yt-dlp -f "bv[height<=1080]+ba/b[height<=1080]" --restrict-filenames -o "source.%(ext)s" "URL"
# 720p max (smaller, faster)
yt-dlp -f "bv[height<=720]+ba/b[height<=720]" --restrict-filenames -o "source.%(ext)s" "URL"
# Audio only (for transcription-only workflows)
yt-dlp -x --audio-format wav --restrict-filenames -o "audio.%(ext)s" "URL"
```
### Metadata Inspection
```
# Get full metadata as JSON (duration, title, chapters, available subs)
yt-dlp --dump-json "URL"
# Key fields: duration, title, description, chapters, subtitles, automatic_captions
```
### YouTube Auto-Subtitles
```
# Download auto-generated subtitles in json3 format (word-level timing)
yt-dlp --write-auto-subs --sub-lang en --sub-format json3 --skip-download --restrict-filenames -o "source" "URL"
# Download manual subtitles if available
yt-dlp --write-subs --sub-lang en --sub-format srt --skip-download --restrict-filenames -o "source" "URL"
# List available subtitle languages
yt-dlp --list-subs "URL"
```
### Useful Flags
- `--restrict-filenames` — safe ASCII filenames (no spaces/special chars) — important on all platforms
- `--no-playlist` — download single video even if URL is in a playlist
- `-o "template.%(ext)s"` — output template (%(ext)s auto-detects format)
- `--cookies-from-browser chrome` — use browser cookies for age-restricted content
- `--extract-audio` / `-x` — extract audio only
- `--audio-format wav` — convert audio to wav (for whisper)
---
## Whisper Transcription Reference
### Audio Extraction for Whisper
```
# Extract mono 16kHz WAV (whisper's preferred input format)
ffmpeg -i source.mp4 -vn -ar 16000 -ac 1 -y audio.wav
```
### Basic Transcription
```
# Standard transcription with word-level timestamps
whisper audio.wav --model small --output_format json --word_timestamps true --language en
# Faster alternative (same flags, 4x speed)
whisper-ctranslate2 audio.wav --model small --output_format json --word_timestamps true --language en
```
### Model Sizes
| Model | VRAM | Speed | Quality | Use When |
|-------|------|-------|---------|----------|
| tiny | ~1GB | Fastest | Rough | Quick previews, testing pipeline |
| base | ~1GB | Fast | OK | Short clips, clear speech |
| small | ~2GB | Good | Good | **Default — best balance** |
| medium | ~5GB | Slow | Better | Important content, accented speech |
| large-v3 | ~10GB | Slowest | Best | Final production, multiple languages |
Note: On macOS Apple Silicon, consider `mlx-whisper` as a faster native alternative.
### JSON Output Structure
```json
{
"text": "full transcript text...",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 4.52,
"text": " Hello everyone, welcome back.",
"words": [
{"word": " Hello", "start": 0.0, "end": 0.32, "probability": 0.95},
{"word": " everyone,", "start": 0.32, "end": 0.78, "probability": 0.91},
{"word": " welcome", "start": 0.78, "end": 1.14, "probability": 0.98},
{"word": " back.", "start": 1.14, "end": 1.52, "probability": 0.97}
]
}
]
}
```
- `segments[].words[]` gives word-level timing when `--word_timestamps true`
- `probability` indicates confidence (< 0.5 = likely wrong)
---
## YouTube json3 Subtitle Parsing
### Format Structure
```json
{
"events": [
{
"tStartMs": 1230,
"dDurationMs": 5000,
"segs": [
{"utf8": "hello ", "tOffsetMs": 0},
{"utf8": "world ", "tOffsetMs": 200},
{"utf8": "how ", "tOffsetMs": 450},
{"utf8": "are you", "tOffsetMs": 700}
]
}
]
}
```
### Extracting Word Timing
For each event and each segment within it:
- `word_start_ms = event.tStartMs + seg.tOffsetMs`
- `word_start_secs = word_start_ms / 1000.0`
- `word_text = seg.utf8.trim()`
Events without `segs` are line breaks or formatting — skip them.
Events with `segs` containing only `"\n"` are newlines — skip them.
---
## SRT Generation from Transcript
### SRT Format
```
1
00:00:00,000 --> 00:00:02,500
First line of caption text
2
00:00:02,500 --> 00:00:05,100
Second line of caption text
```
### Rules for Building Good SRT
- Group words into subtitle lines of ~8-12 words (2-3 seconds per line)
- Break at natural pause points (periods, commas, clause boundaries)
- Keep lines under 42 characters for readability on mobile
- Adjust timestamps relative to clip start (subtract clip start time from all timestamps)
- Timestamp format: `HH:MM:SS,mmm` (comma separator, not dot)
- Each entry: index line, timestamp line, text line(s), blank line
- Use `file_write` tool to create the SRT file — works identically on all platforms
### Styled Captions with ASS Format
For animated/styled captions, use ASS subtitle format instead of SRT:
```
ffmpeg -i clip.mp4 -vf "subtitles=clip.ass:force_style='FontSize=22,FontName=Arial,Bold=1,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2,Shadow=1,Alignment=2,MarginV=40'" -c:a copy output.mp4
```
Key ASS style properties:
- `PrimaryColour=&H00FFFFFF` — white textPlaywright-based browser automation patterns for autonomous web interaction
Expert knowledge for AI intelligence collection — OSINT methodology, entity extraction, knowledge graphs, change detection, and sentiment analysis
Expert knowledge for the Infisical Sync Hand — Infisical API reference, vault operations, error patterns, security guidance
Expert knowledge for AI lead generation — web research, enrichment, scoring, deduplication, and report generation
Expert knowledge for AI forecasting — superforecasting principles, signal taxonomy, confidence calibration, reasoning chains, and accuracy tracking
Expert knowledge for AI deep research — methodology, source evaluation, search optimization, cross-referencing, synthesis, and citation formats
Expert knowledge for autonomous market intelligence and trading — technical analysis, risk management, Alpaca API, financial data sources
Expert knowledge for AI Twitter/X management — API v2 reference, content strategy, engagement playbook, safety, and performance tracking