Skill107 repo starsupdated 4d ago

wjs-dubbing-video

# wjs-dubbing-video This skill converts a video and target-language SRT subtitle file into a dubbed video file (`*_<lang>_dub.mp4`) with time-aligned text-to-speech audio replacing the original soundtrack. It routes to Volcano TTS for Mandarin Chinese or edge-tts neural voices for other languages, defaulting to single-speaker mono voice unless the user explicitly requests multi-speaker diarization for dialogue or interviews. Use it when a user provides both video and translated subtitles and asks to "dub," "voice over," or add "Chinese dubbing" to their content.

View source Repository: claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jianshuo/claude-skills /tmp/wjs-dubbing-video && cp -r /tmp/wjs-dubbing-video/wjs-dubbing-video ~/.claude/skills/wjs-dubbing-video

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# wjs-dubbing-video

Video + target-language SRT → `*_<lang>_dub.mp4` with a time-aligned TTS voice. **This skill stops at the dub track.** Burn-in + audio bed mixing is the next skill (`/wjs-burning-subtitles/render.py` composites everything in one final encode).

## When to use

- User has a target-language SRT (e.g., `entrevista.zh-CN.srt`) and wants the video to speak that language.
- User says "中文配音 / 配音 / 帮我做配音 / dub it / voice over".
- User has multiple speakers on camera and wants different voices per speaker.

## When NOT to use

- No SRT yet → run `/wjs-transcribing-audio` then `/wjs-translating-subtitles` first.
- Source-language only TTS (rare; usually you translate first) → still use this skill, but pass the source SRT.
- Burn-in only, no audio change → skip to `/wjs-burning-subtitles`.

## Number of speakers — default to one

**Default: assume one speaker.** Use a single voice for the entire dub. This is the right answer for monologues, vlogs, recorded talks, narrator-only clips, and the overwhelming majority of videos people ask about. Don't run diarization, don't tag the SRT with `[A]`/`[B]`, don't bring up multi-speaker complexity.

**Switch to multi-speaker only when the user explicitly says so** — phrasings like "two people", "interview", "dialogue", "conversation between", "separate the speakers", "different voice for each", or a direct request to do diarization. When triggered, follow the "Multi-speaker dubbing" section below.

If you're unsure whether a video is one speaker or many, ship the single-voice version first. Adding speaker separation later is cheap (just regenerate the dub); shipping confused multi-speaker output by default wastes the user's time.

## Engine routing — by voice ID

`scripts/dub.py` auto-routes by voice-ID prefix:

| Voice ID pattern | Engine | Auth |
|---|---|---|
| `zh_..._bigtts` | **Volcano (字节跳动豆包) TTS** | `VOLC_TTS_APPID` + `VOLC_TTS_ACCESS_TOKEN` |
| `zh-CN-...Neural` / `en-US-...Neural` / etc. | **edge-tts** (Microsoft Edge neural) | none (free) |

For Mandarin, Volcano is markedly more natural than edge-tts, especially for emotional/contemplative content. Use edge-tts when Volcano credentials aren't available or as a debugging fallback.

## Volcano TTS (Chinese only)

Endpoint: `https://openspeech.bytedance.com/api/v3/tts/unidirectional` (used for both TTS 1.0 and 2.0; the Resource-Id header picks the backend).

Headers:

```
X-Api-App-Id:       (env: VOLC_TTS_APPID)         # 10-digit speech App ID
X-Api-Access-Key:   (env: VOLC_TTS_ACCESS_TOKEN)  # 32-char token from speech console
X-Api-Resource-Id:  volc.service_type.10029       # see resource ID note below
Content-Type:       application/json
```

Loading credentials: most users keep them in `~/code/.env`. Read them at the top of any session via:

```bash
set -a; source ~/code/.env; set +a
```

### Resource ID — important quirk

The doc lists `seed-tts-2.0` as the "TTS 2.0 (recommended)" resource, but a typical TTS-SeedTTS2.0 console instance does **not** include the popular `*_bigtts` speaker catalog (爽快斯斯, 高冷御姐, 开朗姐姐, etc.). Trying those speakers against `seed-tts-2.0` returns `200 code=55000000 "resource ID is mismatched with speaker related resource"`. The fix is to use `volc.service_type.10029` (the TTS 1.0 V3 endpoint) — the audio quality of the bigtts speakers is identical, and they all work against this resource. The bundled `dub.py` defaults to `volc.service_type.10029`; override with `VOLC_TTS_RESOURCE` env if you have a different instance.

Other 401/403 errors:

- `401 code=45000010 "load grant: requested grant not found in SaaS storage"` — the App ID + key combo is valid against the gateway, but the user has not activated this resource. They must go to 火山引擎 → 语音技术 → 语音合成大模型 → 实例管理 and 开通 the service. No workaround.
- `403 code=45000030` — the speaker isn't included in the user's instance bundle.

### Response format

Despite the doc's casual language, the response is **streaming NDJSON**, not a single JSON object and not raw audio bytes. Each line is a separate JSON event with a base64-encoded MP3 chunk in `data`. The terminal event has `code: 20000000` (which means OK in this API's success codes — different from `code: 0`). Concatenate the decoded chunks for the full MP3.

```python
import base64, json, requests
audio = b""
r = requests.post(url, headers=h, json=payload, timeout=60, stream=True)
for line in r.iter_lines():
    if not line: continue
    evt = json.loads(line)
    if evt.get("code") not in (0, None, 20000000):
        raise RuntimeError(f"code={evt.get('code')} {evt.get('message')}")
    if evt.get("data"):
        audio += base64.b64decode(evt["data"])
```

### Speaker catalog (verified working under `volc.service_type.10029`)

Full list at volcengine.com/docs/6561/1257544 — but availability depends on your instance bundle. Confirmed-working female voices for the typical SeedTTS-2.0 starter instance:

| Speaker ID                                    | 中文名     | Feel                       |
| ---                                           | ---        | ---                        |
| `zh_female_gaolengyujie_moon_bigtts`          | 高冷御姐   | **Best for contemplative/spiritual content.** Mature, restrained, calm. |
| `zh_female_kailangjiejie_moon_bigtts`         | 开朗姐姐   | Warm older-sister storytelling. |
| `zh_female_shuangkuaisisi_moon_bigtts`        | 爽快斯斯   | Versatile, conversational baseline. |
| `zh_female_linjianvhai_moon_bigtts`           | 邻家女孩   | Casual, lifestyle-vlog. |
| `zh_female_yuanqinvyou_moon_bigtts`           | 元气女友   | Lively, upbeat. |
| `zh_female_meilinvyou_moon_bigtts`            | 美丽女友   | Soft, intimate. |
| `zh_female_shuangkuaisisi_emo_v2_mars_bigtts` | 斯斯情感版 | Full emotional range — pair with explicit emotion + scale. |

These voices return 55000000 against the typical instance even though the doc lists them: `vv_uranus_bigtts`, `wenroushunv_moon_bigtts`, `qingxin_moon_bigtts`, `yingmaoxiaoyuan_moon_bigtts`, `tianxinxia

More from this repository

skill-quality-reviewerSubagent

Repo-wide drift detector for the wjs-* Claude Code skills in this marketplace. Sweeps every SKILL.md, scores it against the repo's own conventions (V-ing naming, trigger-phrase density, companion files, description shape), and returns a grouped punch list ordered by severity. Read-only — never edits files. Use before pushing a batch of skill changes, or whenever you wonder "are these skills still internally consistent?

wangjianshuo-perspectiveSkill

wjs-auditing-projectSkill

Use when the user asks to audit what's wrong with a project, "make it right", "看看项目出了什么问题", "为什么用户的需求还没上线", "为什么没提交App Store", "为什么没新build", or wants a holistic state-of-the-project check covering unmerged branches, stalled PRs, failed GitHub Actions, stale builds, plan drift (TODOS.md / ROADMAP), unreleased commits, and log errors. Runs read-only investigation, presents a grouped checklist, fixes only after explicit user confirmation. Aware of the Cathier iOS app workflow (Xcode + fastlane + auto-merge @claude PRs from in-app feedback).

wjs-burning-subtitlesSkill

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

wjs-cleaning-spamSkill

Use when the user complains about spam on his X/Twitter posts — 同城面付 / 寻固炮 / 线下上门 / 免费破处这类引流号在他推文下刷的 emoji 垃圾回复 — and wants them removed. Covers the last 7 days (X recent-search window). Triggers — "把这些spam删掉", "清理X垃圾回复", "推文下面好多引流号", "clean spam replies", "/wjs-cleaning-spam".

wjs-converting-text-to-videoSkill

Use when the user wants a 王建硕-style WeChat article (article.md) turned into a narrated short MP4 video — TTS voiceover via 火山引擎 Volcano TTS, HyperFrames CSS/GSAP animation per scene, subtle SFX, abstract watercolor background, full pipeline rendering to 1080×1920 portrait MP4 (30-90s). Triggers — "把这篇文章做成视频", "做一个解说视频", "讲解视频", "/wjs-converting-text-to-video".

wjs-converting-wp-to-hugoSkill

Use when migrating a WordPress site to a Hugo static site on GitHub Pages from a WXR export (.xml) plus the wp-content/uploads folder — preserving /archives/<id>/ URLs, localizing images, and deploying via GitHub Actions. Triggers — "把 WordPress 迁成 Hugo", "wordpress 转静态站", "migrate WordPress to Hugo", "WXR to Hugo", "publish WordPress to GitHub Pages", "/wjs-converting-wp-to-hugo".

wjs-eating-and-growingSkill

吃一堑长一智 — 走完 5 步交互式反思（堑 → 自动输出 → 旧权重 → 新参数 → 替代动作），从「情绪复盘」推进到「行为训练」，把第一反应这一层 L3 权重练新。Use when 王建硕 reflects on a personal setback, mistake, or recurring pattern (反思, 复盘, 回顾, 总结教训, 吃一堑, 长一智, "这次又栽了", "怎么又这样", "为什么我总是…", "想开点都做不到", "知道道理但做不到"). For the user as a human, not for Claude's task post-mortems.