Skip to main content
ClaudeWave
Skill82 estrellas del repoactualizado 2d ago

wjs-syncing-multicam

Use when the user has 2+ video / audio recordings of the same event captured by different devices (cameras, phones, separate audio recorders) and wants them aligned to a single common timeline. Outputs only a lightweight `.sync.json` sidecar per input — original files are never re-encoded. Triggers — "多机位同步", "对齐这几个机位", "match camera timelines", "sync these angles", "audio drift between cameras", "separate audio recorder", "Riverside / Zoom recording that needs to line up".

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jianshuo/claude-skills /tmp/wjs-syncing-multicam && cp -r /tmp/wjs-syncing-multicam/wjs-syncing-multicam ~/.claude/skills/wjs-syncing-multicam
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# wjs-syncing-multicam

Compute a single time offset for each multi-source recording of the same event using audio cross-correlation, and emit a `.sync.json` sidecar next to each original. **Originals are never modified, copied, or re-encoded.** Downstream tools use `-itsoffset` to apply the offset at consume time.

## Setup & commands

The implementation lives in the open-source **`polysync`** pip package (<https://pypi.org/project/polysync/> · <https://github.com/jianshuo/polysync>) — this skill no longer ships its own scripts. Ensure it's installed, then drive it via its CLI:

```bash
python3 -m pip install -U polysync      # needs ffmpeg/ffprobe on PATH

polysync sync   REFERENCE SOURCE        # align SOURCE to REFERENCE, write sidecars
polysync sync   REFERENCE SOURCE --partial   # source covers only part of the session
polysync verify REFERENCE SOURCE SOURCE.sync.json   # independent residual check
```

Run one `polysync sync` per non-reference angle (reference first, same reference each time). The sections below document the algorithm, the sidecar schema, and the gotchas baked into the package — read them to interpret output and choose flags.

## Design principle — sidecar over re-encode

Earlier versions of this skill produced `*_synced.MOV` files by trimming + re-encoding to bake the offset into the file. We removed that:

- **Disk** — a 75-min 4K shoot from 3 cameras is 60+ GB. Re-encoded synced copies double that for no information gain.
- **Quality** — every re-encode is lossy. The originals are the source of truth; sidecars are reversible metadata.
- **Speed** — `_synced.MOV` generation took 10+ min per file on Apple Silicon; sidecar emission takes seconds.
- **Composability** — any downstream tool (`polysync edit`, NLE import, ffmpeg one-liners) reads the sidecar and applies the offset itself. No tool-specific file format lock-in.

## When NOT to use

- Single-camera footage — nothing to sync to. For splitting one source into clips, use **video-segmentation**.
- Sources already aligned in an NLE timeline — don't fight the editor.
- For the auto-edit / cut / PiP rendering step that comes AFTER sync, use **wjs-editing-multicam** (consumes these sidecars).

## Why envelope-based, not raw waveform

Raw PCM cross-correlation gives weak peaks and false matches when the two mics have different gain / room response — i.e., almost always with a secondary cam. The log-energy envelope captures dialogue and music dynamics, which both mics hear regardless of frequency response. **Don't skip the envelope step — it's the entire reason this skill is robust at low SNR.**

## Algorithm

1. **Extract mono PCM at 8 kHz, 16-bit** from each input. The audio stream is **auto-selected by loudness** (`loudest_audio_stream`): probe each `0:a:N` over a 60 s mid-file window and pick the highest mean volume. Multi-track pro cameras break a naive `0:a:0` — Sony FX6 MXF clips carry 4 mono PCM tracks and routinely leave a:0 / a:1 **dead (~-90 dB)** with the room mic on a:2 / a:3; correlating the silent track fails to sync. Single-stream inputs (most MP4 cams) short-circuit to a:0.
2. **Log-energy envelope** at 100 Hz (10 ms hop, 50 ms window). High-pass with a 2nd-order Butterworth, 0.05 Hz cutoff, filtfilt — removes slow drift and gain offsets.
3. **FFT cross-correlate envelopes** end-to-end → coarse offset (~10 ms).
4. **Refine at sample level** with a 60 s probe from B near the coarse-aligned position in A, ±2 s search window, parabolic peak interpolation.
5. **Multi-probe drift check** — repeat step 4 every ~3 min. Linear fit `delta(t) = slope·t + intercept` reveals real clock drift (5–50 ppm typical). Use the **midpoint-canonical** offset (`slope · midpoint + intercept`) so residual error is symmetric around zero.
6. **Compute overlap window** in the reference timeline: `overlap = [max(0, delta), min(ref_dur, delta + src_dur)]`.
7. **Emit `.sync.json` sidecar** next to each non-reference input. No file is copied, trimmed, or re-encoded. The reference input gets a sidecar too (with `delta_seconds: 0`) so downstream code can treat all inputs uniformly.

`polysync sync` is the implementation. It emits **only** the `.sync.json` sidecar — no `_synced.MOV`, no re-encode.

## Sidecar schema (`<input>.sync.json`)

One sidecar per original input, written next to it. Pure JSON, no comments in-file — the field reference below is canonical.

```json
{
  "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.",
  "schema_version": 1,
  "source": "cam_b.MOV",
  "reference": "cam_a.MOV",
  "delta_seconds": 12.345,
  "drift_slope": 1.8e-5,
  "overlap_in_reference": [12.345, 4512.180],
  "overlap_in_source":    [0.000,   4499.835],
  "verification": {
    "median_residual_ms": 4.2,
    "residual_spread_ms": 11.8,
    "probe_count": 24
  }
}
```

### Field reference

| Field | Type | Meaning |
|---|---|---|
| `_about` | string | Human-readable one-liner. Includes pointer back to this SKILL.md. Always present. |
| `schema_version` | int | Bumps on any breaking change to this schema. Current: `1`. |
| `source` | string | Filename of the original this sidecar describes. Relative to the sidecar's directory. **Never points to a re-encoded file.** |
| `reference` | string | The input whose timeline we're aligned to. Reference's own sidecar lists itself here. |
| `delta_seconds` | float | The source's `t=0` expressed in the reference's timeline. **If positive, source starts after reference; pass to ffmpeg as `-itsoffset <delta>`.** Can be negative (source starts before reference, e.g. early-rolling camera). |
| `drift_slope` | float | Linear clock-drift slope (dimensionless, ~10⁻⁵). `0.0` means no measurable drift. Downstream applies `atempo = 1 + drift_slope` to the source ONLY for sync-sound / long-form lip-sync — for camera-cut editing, ignore. |
| `overlap_in_reference` | `[start, end]` (seconds) | The window during which both source and reference have cove
skill-quality-reviewerSubagent

Repo-wide drift detector for the wjs-* Claude Code skills in this marketplace. Sweeps every SKILL.md, scores it against the repo's own conventions (V-ing naming, trigger-phrase density, companion files, description shape), and returns a grouped punch list ordered by severity. Read-only — never edits files. Use before pushing a batch of skill changes, or whenever you wonder "are these skills still internally consistent?

wangjianshuo-perspectiveSkill

|

wjs-auditing-projectSkill

Use when the user asks to audit what's wrong with a project, "make it right", "看看项目出了什么问题", "为什么用户的需求还没上线", "为什么没提交App Store", "为什么没新build", or wants a holistic state-of-the-project check covering unmerged branches, stalled PRs, failed GitHub Actions, stale builds, plan drift (TODOS.md / ROADMAP), unreleased commits, and log errors. Runs read-only investigation, presents a grouped checklist, fixes only after explicit user confirmation. Aware of the Cathier iOS app workflow (Xcode + fastlane + auto-merge @claude PRs from in-app feedback).

wjs-burning-subtitlesSkill

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

wjs-cleaning-spamSkill

Use when the user complains about spam on his X/Twitter posts — 同城面付 / 寻固炮 / 线下上门 / 免费破处 这类引流号在他推文下刷的 emoji 垃圾回复 — and wants them removed. Covers the last 7 days (X recent-search window). Triggers — "把这些spam删掉", "清理X垃圾回复", "推文下面好多引流号", "clean spam replies", "/wjs-cleaning-spam".

wjs-converting-text-to-videoSkill

Use when the user wants a 王建硕-style WeChat article (article.md) turned into a narrated short MP4 video — TTS voiceover via 火山引擎 Volcano TTS, HyperFrames CSS/GSAP animation per scene, subtle SFX, abstract watercolor background, full pipeline rendering to 1080×1920 portrait MP4 (30-90s). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".

wjs-converting-wp-to-hugoSkill

Use when migrating a WordPress site to a Hugo static site on GitHub Pages from a WXR export (.xml) plus the wp-content/uploads folder — preserving /archives/<id>/ URLs, localizing images, and deploying via GitHub Actions. Triggers — "把 WordPress 迁成 Hugo", "wordpress 转静态站", "migrate WordPress to Hugo", "WXR to Hugo", "publish WordPress to GitHub Pages", "/wjs-converting-wp-to-hugo".

wjs-dubbing-videoSkill

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".