Skip to main content
ClaudeWave
Skill82 repo starsupdated 2d ago

wjs-editing-multicam

Use when the user has 2+ recordings of the same event (each with a `.sync.json` sidecar from wjs-syncing-multicam) and wants them combined into a single MP4 — auto-switching between cams second-by-second on audio energy, with optional picture-in-picture inset. Triggers — "auto-edit multicam", "做个剪辑", "切几个机位", "把这几个视频合成一个", "combine these angles", "PiP overlay".

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jianshuo/claude-skills /tmp/wjs-editing-multicam && cp -r /tmp/wjs-editing-multicam/wjs-editing-multicam ~/.claude/skills/wjs-editing-multicam
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# wjs-editing-multicam

Combine N synced camera angles into a single rendered MP4. Decisions are audio-energy-driven only — the cam with the loudest mic each second wins. Output is hard cuts (or hard cuts plus a corner PiP).

## Setup & commands

The implementation lives in the open-source **`polysync`** pip package (<https://pypi.org/project/polysync/> · <https://github.com/jianshuo/polysync>) — this skill no longer ships its own scripts. Install it, then drive it via its CLI:

```bash
python3 -m pip install -U polysync      # needs ffmpeg/ffprobe on PATH

polysync edit        CAM_A CAM_B CAM_C --out edl.json   # build the decision list
polysync render-cuts edl.json --out out.mp4             # hard cuts
polysync render-pip  edl.json --out out.mp4 --pip bottom-right   # cuts + corner inset
```

`edit` and the renderers read each input's `.sync.json` automatically. Sync first with **wjs-syncing-multicam** (`polysync sync`) if the sidecars don't exist yet.

Render flags for raw camera footage (see **Preflight** below for when to use each):

```bash
# Sony S-Log3 footage, shot vertical, FX6 cams turned on their side, for 小红书:
polysync render-cuts edl.json --out out.mp4 \
    --log slog3 \           # S-Log3/S-Gamut3.Cine -> Rec.709 grade
    --rotate 1:90 --rotate 2:90 \   # rotate cam1,cam2 90° CW (FX6 with no flag)
    --width 1080 --height 1920 --fill \   # vertical, crop-to-fill (no black bars)
    --duck-audio --audio-cams 0,1        # clean speaker-gated audio (cams 0,1 = the two lavs)
```

`--duck-audio` replaces the single-cam soundtrack with a speaker-gated mix: each moment keeps the active speaker's close mic and ducks the rest (much cleaner than a constant 2-mic sum, which piles up bleed/room tone). `--audio-cams 0,1` restricts gating to the real speaker mics — **always exclude the wide/room cam**, whose mic sits at a similar level but is reverby and would otherwise get picked. (See the editing-quality note below for the why.)

## Preflight — ALWAYS check raw footage before rendering (hard-won)

Straight-off-the-card footage renders WRONG without these checks. Spend 2 minutes here or you'll render the whole thing broken (we did).

1. **Color profile (Log).** Sony FX3/FX6 default to **S-Log3 / S-Gamut3.Cine** — flat, grey, low-contrast. It MUST be converted to Rec.709 or it looks washed-out and broken. Check the `.XML` sidecar's `CaptureGammaEquation` (`s-log3-cine`) or `ffprobe -show_entries stream=color_transfer`. Fix: `--log slog3`. (Performance: the LUT is applied AFTER downscale automatically — a 3D LUT on 4K is ~4x slower than on 1080p for an identical result.)
2. **Orientation.** Phone / vertically-mounted shoots record rotated. Extract one frame per cam (`ffmpeg -ss 200 -i CAM -frames:v 1 f.jpg`) and LOOK. Some cameras (FX3) write a rotation flag → ffmpeg auto-rotates, fine. Others (FX6 physically turned on its side) write **no flag** → people come out lying down. Fix per-cam: `--rotate 1:90` (90 = clockwise; try 270 if upside-from-the-other-side).
3. **Delivery orientation.** 小红书 / Reels / Shorts are **vertical** — and this footage is usually shot vertical. Render `--width 1080 --height 1920 --fill`. Default (1920×1080, pad) is for landscape only; mixing a portrait source into a landscape frame pillarboxes it (ugly black side bars).
4. **Staggered camera starts.** Cameras rarely roll at the same instant; the sidecar `delta`/coverage shows it. The opening N seconds may be single-cam (only the first-rolling camera covers t=0). That's unavoidable — expect a single-cam intro of `max(delta)` seconds.

## Editing quality — when `polysync edit` isn't enough

`polysync edit` switches to the **loudest mic** per second. With **close, bleeding mics** (each cam's mic also picks up the other speaker loudly), the close/guest mic stays loudest even when the *other* person talks, so the editor over-selects that cam and the cuts don't track the real speaker. Two fixes, applied by hand on the EDL when "cut to the correct speaker" matters:

- **Per-mic baseline-normalized speaker attribution.** Per second, subtract each mic's own median energy, then pick whichever mic is *highest relative to its own baseline* → that's who's actually talking. (polysync's raw `cam[k] - mean(others)` doesn't remove the per-mic baseline offset, so it favors the loud mic.)
- **Cutaways for rhythm (剪辑感).** Audio attribution alone gives long static holds. Cap any single shot (~8–10 s) and insert ~3 s cutaways to the **listener** (reaction shot) and the **wide** cam, alternating. The wide cam's quiet mic means the editor never picks it on energy — inject it deliberately for establishing / 整体.

## What this skill IS — and IS NOT

| Is | Is not |
|---|---|
| Audio-energy-driven cam switching | Face / framing detection (no face_recognition, no MediaPipe) |
| Single-source audio (one cam's mic) | Multi-mic mix / per-speaker gating |
| Hard cuts, with optional PiP inset | Crossfades / opacity transitions / sliding animations |
| `ffmpeg` concat + `overlay` filter renders | HyperFrames composition / `<hf-clip>` |
| Coverage-aware (won't pick a cam outside its sidecar window) | Frame-accurate beat alignment / VAD-edge cuts |

If you need face tracking, fade transitions, captions, or HyperFrames composition, use the **hyperframes** skill on top of this skill's MP4 output.

## REQUIRED INPUT

**Original camera files (untouched) plus their `.sync.json` sidecars next to them.** If sources aren't synced yet, run **wjs-syncing-multicam** first to write the sidecars. Missing sidecar = cam assumed at delta=0, full coverage.

`polysync edit` reads each sidecar for `delta_seconds` + `overlap_in_reference`, lifts the cam's audio envelope into the reference timeline, and only schedules a cam during its coverage window. `polysync render-cuts` / `polysync render-pip` apply `ffmpeg -itsoffset` per input using the EDL's `deltas[]` array.

## When NOT to use

- One source — nothing to switch between; use **video-segmentation**.
- Polishe
skill-quality-reviewerSubagent

Repo-wide drift detector for the wjs-* Claude Code skills in this marketplace. Sweeps every SKILL.md, scores it against the repo's own conventions (V-ing naming, trigger-phrase density, companion files, description shape), and returns a grouped punch list ordered by severity. Read-only — never edits files. Use before pushing a batch of skill changes, or whenever you wonder "are these skills still internally consistent?

wangjianshuo-perspectiveSkill

|

wjs-auditing-projectSkill

Use when the user asks to audit what's wrong with a project, "make it right", "看看项目出了什么问题", "为什么用户的需求还没上线", "为什么没提交App Store", "为什么没新build", or wants a holistic state-of-the-project check covering unmerged branches, stalled PRs, failed GitHub Actions, stale builds, plan drift (TODOS.md / ROADMAP), unreleased commits, and log errors. Runs read-only investigation, presents a grouped checklist, fixes only after explicit user confirmation. Aware of the Cathier iOS app workflow (Xcode + fastlane + auto-merge @claude PRs from in-app feedback).

wjs-burning-subtitlesSkill

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

wjs-cleaning-spamSkill

Use when the user complains about spam on his X/Twitter posts — 同城面付 / 寻固炮 / 线下上门 / 免费破处 这类引流号在他推文下刷的 emoji 垃圾回复 — and wants them removed. Covers the last 7 days (X recent-search window). Triggers — "把这些spam删掉", "清理X垃圾回复", "推文下面好多引流号", "clean spam replies", "/wjs-cleaning-spam".

wjs-converting-text-to-videoSkill

Use when the user wants a 王建硕-style WeChat article (article.md) turned into a narrated short MP4 video — TTS voiceover via 火山引擎 Volcano TTS, HyperFrames CSS/GSAP animation per scene, subtle SFX, abstract watercolor background, full pipeline rendering to 1080×1920 portrait MP4 (30-90s). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".

wjs-converting-wp-to-hugoSkill

Use when migrating a WordPress site to a Hugo static site on GitHub Pages from a WXR export (.xml) plus the wp-content/uploads folder — preserving /archives/<id>/ URLs, localizing images, and deploying via GitHub Actions. Triggers — "把 WordPress 迁成 Hugo", "wordpress 转静态站", "migrate WordPress to Hugo", "WXR to Hugo", "publish WordPress to GitHub Pages", "/wjs-converting-wp-to-hugo".

wjs-dubbing-videoSkill

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".