Skip to main content
ClaudeWave
Skill82 repo starsupdated 2d ago

wjs-reframing-video

Use when the user wants to convert a video between horizontal and vertical orientations while preserving the inverted aspect ratio (16:9 ↔ 9:16, 4:3 ↔ 3:4, 21:9 ↔ 9:21). The skill crops a narrow band from the source and tracks the active speaker — the person whose mouth is moving — via MediaPipe face landmarks and mouth-aspect-ratio variance, so the talker stays in frame even when other people are visible. Triggers — "横转竖", "竖转横", "做成竖屏发抖音/视频号/小红书", "16:9 to 9:16", "make this vertical for Reels / TikTok / YouTube Shorts", "crop to portrait", "convert to landscape".

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jianshuo/claude-skills /tmp/wjs-reframing-video && cp -r /tmp/wjs-reframing-video/wjs-reframing-video ~/.claude/skills/wjs-reframing-video
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# wjs-reframing-video

Convert a video's orientation by **cropping** a narrow band from the source — not by physically rotating it. The crop window follows the **active speaker** (the face whose mouth is *moving*), not just the largest or most-confident face. A `.crop.json` sidecar records the crop plan, the per-segment speaker decisions, and the parameters used. The original input is never modified.

## When to use

- Repurposing a 16:9 podcast / interview / talk for vertical short-video platforms (WeChat Channels 视频号, Douyin 抖音, Xiaohongshu 小红书, YouTube Shorts, TikTok, Reels).
- Repurposing a 9:16 phone recording for horizontal players (YouTube long-form, blog embeds).
- Repurposing 4:3 archive footage for 3:4 mobile, or vice versa.

The output aspect is the source aspect with width and height swapped — 16:9 → 9:16, not "letterboxed 16:9 in a 9:16 frame".

## When NOT to use

- **Multi-person Q&A** where each face needs its own crop — this skill picks one crop track per video. For per-speaker split renders, use **wjs-editing-multicam** instead.
- **Animated content / B-roll with no faces** — falls back to center crop, usually wrong for the intent.
- **Heavy camera motion in the source** (handheld pan/zoom) — the face tracker amplifies camera shake. Stabilize first.
- **Source already at target aspect** — no work to do.

## What this skill IS — and IS NOT

| Is | Is not |
|---|---|
| **Visual active-speaker detection** via MAR (mouth-aspect-ratio) variance | Audio-visual fusion (audio energy + lip motion cross-correlated) |
| Stable face tracking across frames by center-distance matching | Re-identification across long gaps / occlusions |
| Speaker-aligned segments with hysteresis to prevent flicker | Frame-by-frame switching on every flicker |
| `--face-pick speaker` (default) — pick whoever's mouth is moving | `--face-pick largest` (opt-in legacy) — pick largest face |
| **Hard cuts between segments, fixed crop within each segment** (`--motion cut`, default) | Smooth panning that drifts during a speaker's turn (opt-in `--motion smooth`) |
| Audio stream-copy (bit-exact) | Audio reprocessing / re-encoding |
| MediaPipe Tasks `FaceLandmarker` (478-pt mesh) at 5 fps sampled via ffmpeg | Per-frame neural inpainting / out-painting |
| One `ffmpeg crop + scale` pass | Frame-by-frame Python compositor |

Falls back to "largest face" automatically when no one is talking (silence, music-only stretches).

## Dependencies

```bash
pip install mediapipe opencv-python numpy
```

(MediaPipe lives outside the standard Python distribution; ffmpeg and ffprobe must be on `PATH`.)

**First-run model download**: MediaPipe 0.10+ uses the Tasks API, which needs a `face_landmarker.task` model file (~4 MB). On the first call, `crop.py` downloads it to `~/.claude/skills/wjs-reframing-video/models/` and caches it for subsequent runs. The script fails offline on first run.

**Range limitation**: The bundled landmarker is tuned for faces within ~2 m of the camera (selfie / podcast / interview distance). Wide event shots with small faces may not detect — sample a frame first to confirm.

## Crop math

Source aspect = `W / H`. Target aspect = `H / W` (inverted). Compute crop window:

| Source orientation | Crop window |
|---|---|
| Horizontal (W > H) → Portrait | `W_crop = H × H / W`, `H_crop = H` (narrow vertical band) |
| Portrait (W < H) → Horizontal | `W_crop = W`, `H_crop = W × W / H` (narrow horizontal band) |

For 1920×1080 → portrait, `W_crop = 608`, `H_crop = 1080`. Final scale to 1080×1920 (upscale ~1.78×).
For 1080×1920 → landscape, `W_crop = 1080`, `H_crop = 608`. Final scale to 1920×1080.

Override the final size via `--output-size 1080x1920` if you want native crop dimensions instead of upscaling.

## Pipeline

1. **Probe** input dimensions, fps, duration via ffprobe.
2. **Decide orientation** — auto from aspect (`--target portrait|landscape` to override).
3. **Sample frames at `--sample-fps`** (default 5; high enough to catch mouth motion — Nyquist for speech is ~10 Hz, we need at least 4–5 fps).
4. **Detect face landmarks** per sampled frame with MediaPipe Tasks `FaceLandmarker` (478 landmarks). For each detected face record: center, size proxy, MAR (mouth-aspect-ratio = inner-lip vertical distance / horizontal mouth-corner distance).
5. **Track faces** across frames by center-distance matching → each face gets a stable `face_id`.
6. **Per-sample active speaker**: for each face track, variance of MAR over a sliding window (`--mar-var-window-sec`, default 1 s). The face with the highest variance is "speaking". Below `--mar-var-threshold`, no one is speaking → fall back to largest face.
7. **Hysteresis**: a candidate switch only commits if the new speaker is stable for `--min-segment-sec` (default 1.5 s). Shorter flickers are squashed — prevents the crop from ping-ponging on a one-frame mis-detection.
8. **Speaker-aligned segments** → for each segment, mean (cx, cy) of that speaker's face over the segment becomes the crop center, *fixed* for the full duration of the segment.
9. **Build a ffmpeg step-function expression** (`--motion cut`, default) that holds each segment's crop position constant and **jumps instantly at each segment boundary** — the visual feel of a real cut between camera angles. (`--motion smooth` switches to piecewise-linear pan between segment midpoints; rarely the right call for talking-head content because the camera appears to drift mid-sentence.)
10. **Render** one ffmpeg pass — `crop=W:H:x='expr':y='expr', scale=OUT_W:OUT_H`. The crop filter evaluates `x` and `y` per frame natively. Audio stream-copied.

`scripts/crop.py` is the implementation. Output side effects:
- `<input>.crop.json` — sidecar with the crop plan
- `<input>_cropped.mp4` — final cropped + scaled video

## Sidecar schema (`<input>.crop.json`)

```json
{
  "_about": "wjs-reframing-video crop plan for cam_a.MOV. Active-speaker detected via MAR variance.",
  "_help": {
    "source_size":     "[width, height] in
skill-quality-reviewerSubagent

Repo-wide drift detector for the wjs-* Claude Code skills in this marketplace. Sweeps every SKILL.md, scores it against the repo's own conventions (V-ing naming, trigger-phrase density, companion files, description shape), and returns a grouped punch list ordered by severity. Read-only — never edits files. Use before pushing a batch of skill changes, or whenever you wonder "are these skills still internally consistent?

wangjianshuo-perspectiveSkill

|

wjs-auditing-projectSkill

Use when the user asks to audit what's wrong with a project, "make it right", "看看项目出了什么问题", "为什么用户的需求还没上线", "为什么没提交App Store", "为什么没新build", or wants a holistic state-of-the-project check covering unmerged branches, stalled PRs, failed GitHub Actions, stale builds, plan drift (TODOS.md / ROADMAP), unreleased commits, and log errors. Runs read-only investigation, presents a grouped checklist, fixes only after explicit user confirmation. Aware of the Cathier iOS app workflow (Xcode + fastlane + auto-merge @claude PRs from in-app feedback).

wjs-burning-subtitlesSkill

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

wjs-cleaning-spamSkill

Use when the user complains about spam on his X/Twitter posts — 同城面付 / 寻固炮 / 线下上门 / 免费破处 这类引流号在他推文下刷的 emoji 垃圾回复 — and wants them removed. Covers the last 7 days (X recent-search window). Triggers — "把这些spam删掉", "清理X垃圾回复", "推文下面好多引流号", "clean spam replies", "/wjs-cleaning-spam".

wjs-converting-text-to-videoSkill

Use when the user wants a 王建硕-style WeChat article (article.md) turned into a narrated short MP4 video — TTS voiceover via 火山引擎 Volcano TTS, HyperFrames CSS/GSAP animation per scene, subtle SFX, abstract watercolor background, full pipeline rendering to 1080×1920 portrait MP4 (30-90s). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".

wjs-converting-wp-to-hugoSkill

Use when migrating a WordPress site to a Hugo static site on GitHub Pages from a WXR export (.xml) plus the wp-content/uploads folder — preserving /archives/<id>/ URLs, localizing images, and deploying via GitHub Actions. Triggers — "把 WordPress 迁成 Hugo", "wordpress 转静态站", "migrate WordPress to Hugo", "WXR to Hugo", "publish WordPress to GitHub Pages", "/wjs-converting-wp-to-hugo".

wjs-dubbing-videoSkill

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".