Skill107 repo starsupdated 4d ago

wjs-converting-text-to-video

This Claude Code skill transforms a Wechat-style Chinese article (article.md) into a 1080×1920 portrait short video lasting 30-90 seconds. It combines Volcano Engine TTS narration, HyperFrames CSS/GSAP animations synchronized to scene transitions, subtle sound effects, and abstract watercolor backgrounds rendered into an MP4 file suitable for social platforms like WeChat Moments, Douyin, or Xiaohongshu. Use it when a user requests video conversion of existing articles via phrases like "把这篇文章做成视频" or direct command triggers.

View source Repository: claude-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jianshuo/claude-skills /tmp/wjs-converting-text-to-video && cp -r /tmp/wjs-converting-text-to-video/wjs-converting-text-to-video ~/.claude/skills/wjs-converting-text-to-video

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# wjs-converting-text-to-video

把一篇王建硕风格的微信公众号 `article.md` 做成 **1080×1920 竖屏、30-90 秒** 的中文解说短视频：TTS 旁白 + HyperFrames CSS/GSAP 动画 + 抽象水彩背景 + 转场 SFX。输出 MP4 给视频号 / 抖音 / 小红书 / Reels。

## What this skill produces

| 维度 | 默认 |
|---|---|
| 尺寸 | 1080×1920 竖屏 (9:16) |
| 时长 | 30-90 秒 |
| Scene 数 | 5-10 |
| 旁白 | 火山引擎 Volcano TTS，默认阿虎对话男声 |
| 背景 | GPT Image 2 生成的抽象水彩 (`bg.png`) + blur 30 + 暖黑半透明 overlay |
| 字体 | Noto Sans SC，hero 900，主文字暖奶白 |
| 输出 | `<article-folder>/<slug>.mp4`（与 `video/` 平行，不放 `video/` 里）|
| 发布 | 自动上传到 YouTube — Portrait → Shorts，Landscape → 普通 video；重新渲染会替换老视频（不累积）|

## When this skill fires

- 用户已有 `article.md`，说「做成视频」「做一个解说」「讲一遍」
- 用户跑 `/wjs-converting-text-to-video <article-folder>`
- 用户说「把昨天发的那 X 篇都做成视频」之类的批量请求

## When NOT to use

- 没有文章稿，只是一个想法 → 先用 `/wjs-publishing-wechat` 写出 article.md，再来
- 用户要的是字幕烧录 / 翻译 / 配音替换 → 用 `/wjs-burning-subtitles` / `/wjs-dubbing-video` / `/wjs-localizing-video`
- 视频要英文 / 西语等非中文 → 本 skill 专注中文 TTS (Volcano 火山引擎)；非中文走 hyperframes 自带 tts 命令 (kokoro 英文还可以)
- 横屏 16:9 → 本 skill 默认竖屏；横屏仅在用户明确要求时改

## Core Principle

**视频不是文章的可视化朗读，而是文章的视觉重构。**

每个 scene 是一个独立的视觉时刻 —— 一个对比、一个排比、一个数字、一个比喻。文字撑满屏幕，黑体加粗，重点字橙色高亮。背景是抽象水彩 (blur 后柔化)，整体调子稳重、克制、有冲击力。

**节奏 > 模板**。一段 5-10 scene 的视频，如果从头到尾都是"两行对照"的同一种排版，就不是视频，是 slideshow。**现代感来自对比** —— 极端字号差、不对称布局、短 scene 与长 scene 交替、纯文字 scene 与几何元素 scene 交替、水彩底 scene 与亮色 punch scene 交替。

**默认是平庸的**。如果只是从模板表顶端挑几种最容易的，结果一定是"平铺直叙的两行格式"。强制走 [Step 1b Scene Mix Rule](#step-1b-scene-mix-rule强制) 配比。

## Workflow

### Step 1: 设计 5-10 个视觉时刻

读 `<article-folder>/article.md`，按论证结构拆成 5-10 个 scene（控制在 30-90 秒总时长）。短文（核心 1-2 个要点）做 5-6 scene / 30-50s；长文 8-10 scene / 60-90s。每个 scene 一段叙述（旁白）+ 一个清晰的视觉骨架。

**模板表 —— 6 类共 16 种，按需混搭**：

#### A. Hero / Punch（强对比 climax，每片 ≥1，时长 ≤4s）
| 模板 | 适合 |
|---|---|
| **A1. 全屏单字 hero** | 1-3 字 climax 词撑满屏，字号 280-400px |
| **A2. Outline hero** | 空心字 `-webkit-text-stroke: 4px #f5efe5; color: transparent;` |
| **A3. Color-flip punch** | 整屏背景换亮色（橙/红/金/翠绿等），反白字 |
| **A4. Gradient text hero** | 大字加 `background: linear-gradient(...); -webkit-background-clip: text;` |

#### B. Contrast / 对照（反差结构，每片 1-2 个，时长 5-8s）
| 模板 | 适合 |
|---|---|
| **B1. 双行对照 + strikethrough** | 「以前 X，现在 Y」「不是 A，是 B」 — **整片最多 2 个** |
| **B2. 左右分屏对照** | 屏幕一分为二（可加竖线分隔） |
| **B3. 对角线对照** | 左上 ↔ 右下，中间大量留白 |

#### C. List / 结构（多项并列，每片 1-2 个，时长 6-10s）
| 模板 | 适合 |
|---|---|
| **C1. N 个卡片横排** | 3-5 个并列，用深暖黑 + 单色边框 |
| **C2. 垂直堆叠关键词** | 6-8 个排比项，可加大数字编号 01-08 |
| **C3. 真网格** | 2×2 / 3×2 网格，每格图标 + 标签（竖屏宽度有限，4 列横排会挤） |
| **C4. 阶梯 / 错位列表** | 每项 `margin-left` 递增 |

#### D. Stat / 数据（数字 climax，每片 ≥1，时长 4-6s）
| 模板 | 适合 |
|---|---|
| **D1. 数字 ticker** | 0 → N 滚动动画（`gsap.to({textContent})`）|
| **D2. 数字 + 标签** | 主数字 200-400px + 60-80px 解释 |
| **D3. 进度条 / 时间轴** | 横向 progress bar + 节点 |

#### E. Quote / Climax（金句落点，每片 1-2 个，时长 6-10s）
| 模板 | 适合 |
|---|---|
| **E1. 段落级 hero text** | 一句 60-100px 金句，左对齐 + 左侧 emphasis bar |
| **E2. 大引号 + 内文** | 巨大半透明开引号作背景装饰 |

#### F. 装饰 / 几何（节奏调味，可选）
| 模板 | 适合 |
|---|---|
| **F1. 格子 + spinner / 进度条** | 多并发画面 |
| **F2. 对话气泡 ↔ 回应** | 角色 A 说 → 角色 B 做 |

**每个 scene 的旁白控制在 3-12 秒**（短 punch 3-4s，长 breath 10-12s，**不要全部都是 5-7s**）。所有 scene 加起来 **30-90 秒**，不要超过 90 秒。文章短就做短，5 个 scene × 6s = 30s 也是合格。

### Step 1b: Scene Mix Rule（强制）

**写完 5-10 个 scene 设计后，按下面 checklist 自查。任何一条不满足 → 回去调整。**

#### 配比硬规则
- [ ] ≥1 个 A 类 / D 类 / C 类 / E 类
- [ ] ≤2 个 B1 模板（双行 strikethrough — 历史上最容易被滥用）
- [ ] ≥1 个 A3 color-flip scene（亮色背景反白字）
- [ ] ≥4 种不同的模板类型（A/B/C/D/E/F 至少 4 类）
- [ ] ≤2 个连续 scene 用同一类

#### 节奏硬规则
- [ ] scene 时长跨度 ≥ 6s（最短 ≤ 4s、最长 ≥ 9s）
- [ ] ≥2 次"短 → 长 → 短"或"长 → 短"节奏切换
- [ ] 字号跨度 ≥ 240px（最大 hero ≥ 320px，最小 ≤ 80px）

#### 布局硬规则
- [ ] ≥2 个 scene 非居中（贴角、对角、左对齐、阶梯等）
- [ ] ≥1 个 scene 留白占 ≥ 60% 屏幕（呼吸）
- [ ] ≥1 个 scene 含几何装饰（粗线、色块、箭头、圆点、大编号）

#### 配色硬规则
- [ ] **大部分 scene 没有 `background:` 色** — 让水彩 bg-image 透出；只有 A3 color-flip 才用纯色 bg
- [ ] color-flip scene 颜色不只是橙/蓝/白（深红 / 深金 / 翠绿 / 青松 / 暗紫 等都可）
- [ ] emphasis 至少 2-3 种颜色（技术词用蓝、价值词用金、增长词用绿、警告词用红）

#### 反单调自检
1. 把所有 scene 截图缩成缩略图并排 — **能一眼分辨吗**？如果 8 个看起来一样 → 重做
2. 第 1、4、7 scene 的视觉密度是不是不一样？应该有的密、有的极简
3. 有"meta-rhythm"吗？比如 A 开场 → 3 个 B/C 展开 → D climax → E 收尾 — 比线性铺更有戏剧弧

### Step 2: 写 `narration_chunks.json`

```json
[
  {"id": "s01", "text": "我们以前，是 AI 的领导。现在，我们就是它的维修工。"},
  {"id": "s02", "text": "..."}
]
```

**写旁白细节**：
- 比 article.md 更口语、更短促，逗号/句号多用让 TTS 自然停顿
- 数字 / 英文混排 OK（"Claude Code"、"100 倍"），Volcano 都能读
- 不写括号注释、不写 `...`、不写破折号 `——`（TTS 会念出 "破折号" 三字）
- 删掉 article.md 里的 `**加粗 markdown**`，只留纯文字
- **去掉百姓网相关 facts**：article.md 里如出现「百姓网」「百姓网现在 X 人」「百姓网员工」等都要 strip 或泛化（"百姓网现在 158 个人" → "现实里没几个真人"）。这是过时信息，不要进视频。同理 visuals 不要出现 "百姓网" label 或 "158 人" stat。详见 [[no-baixing-facts]]

### Step 3: 生成 TTS narration

```bash
cd <article-folder>/video
python3 tts_narration.py
```

脚本默认用 `zh_male_ahu_conversation_wvae_bigtts`（阿虎对话）— 段间插 0.35s 静音，输出 `narration.mp3` + `timing.json`。

**Volcano TTS 注意事项（踩过的坑）**：
- 用 resource `volc.service_type.10029`，speaker 选 `zh_*_*_bigtts`
- **绝对不要传 `emotion` / `emotion_scale`** — 大部分 `_bigtts` 声音会返回 `data: null` 静默失败
- **绝对不要用 kokoro**（hyperframes 自带 tts）— 中文质量差，用户明确不接受
- **避免** `zh_male_jieshuonansheng_mars_bigtts` — 含英文专名（如 "Claude Code"）会循环 hallucinate

**备用声音**（按推荐顺序）：
- `zh_male_ahu_conversation_wvae_bigtts` (阿虎对话) — 默认，自然口语
- `zh_male_M392_conversation_wvae_bigtts` — 同 wvae 系列
- `zh_male_wennuanahu_moon_bigtts` (温暖阿虎) — 更暖、播音感
- `zh_male_silang_mars_bigtts` (思朗) — 沉稳思考，戏剧感强
- `zh_male_baqiqingshu_mars_bigtts` (霸气) — 更有力度

切声音：`python3 tts_narration.py --voice zh_male_silang_mars_bigtts`

### Step 4: 生成水彩背景图

bg-image 是视觉主基调（柔化的抽象水彩）。**不要用 article 的 `illustration.png`** — 手绘示意图细节太多，blur 后变成均匀深色泥（视觉上仍是纯黑）。必须用专门生成的抽象水彩。

```bash
~/.claude/skills/wjs-converting-text-to-video/scripts/generate-bg.sh <article-folder> <theme>
```

`<theme>` 选（根据文章主题）：

| theme | 色板 | 适合 |
|---|---|---|
| `personal` | bright warm yellow, soft coral pink, terracotta, sage green, cream | 个人、手作、温暖 |
| `tech` | cool teal, electric blue, deep purple, mint, white | AI、技术、数据 |
| `reflection` | sage green, dusty blue, la

More from this repository

skill-quality-reviewerSubagent

Repo-wide drift detector for the wjs-* Claude Code skills in this marketplace. Sweeps every SKILL.md, scores it against the repo's own conventions (V-ing naming, trigger-phrase density, companion files, description shape), and returns a grouped punch list ordered by severity. Read-only — never edits files. Use before pushing a batch of skill changes, or whenever you wonder "are these skills still internally consistent?

wangjianshuo-perspectiveSkill

wjs-auditing-projectSkill

Use when the user asks to audit what's wrong with a project, "make it right", "看看项目出了什么问题", "为什么用户的需求还没上线", "为什么没提交App Store", "为什么没新build", or wants a holistic state-of-the-project check covering unmerged branches, stalled PRs, failed GitHub Actions, stale builds, plan drift (TODOS.md / ROADMAP), unreleased commits, and log errors. Runs read-only investigation, presents a grouped checklist, fixes only after explicit user confirmation. Aware of the Cathier iOS app workflow (Xcode + fastlane + auto-merge @claude PRs from in-app feedback).

wjs-burning-subtitlesSkill

Use when the user has a video + an SRT and wants the subtitles either burned into the pixels (libass, always-visible) or soft-muxed as a togglable track. Also handles the final composite step for the localization pipeline — burn subs, mix a dub track, and keep the original audio as a low-volume bed, all in ONE ffmpeg encode (no cascade). Verifies libass availability and auto-downloads a static evermeet ffmpeg build when Homebrew's stripped binary lacks it. Triggers — "烧字幕", "硬字幕", "burn subtitles", "burn-in subs", "embed subtitle", "soft mux SRT", "把字幕烧进视频", "做最终合成".

wjs-cleaning-spamSkill

Use when the user complains about spam on his X/Twitter posts — 同城面付 / 寻固炮 / 线下上门 / 免费破处这类引流号在他推文下刷的 emoji 垃圾回复 — and wants them removed. Covers the last 7 days (X recent-search window). Triggers — "把这些spam删掉", "清理X垃圾回复", "推文下面好多引流号", "clean spam replies", "/wjs-cleaning-spam".

wjs-converting-wp-to-hugoSkill

Use when migrating a WordPress site to a Hugo static site on GitHub Pages from a WXR export (.xml) plus the wp-content/uploads folder — preserving /archives/<id>/ URLs, localizing images, and deploying via GitHub Actions. Triggers — "把 WordPress 迁成 Hugo", "wordpress 转静态站", "migrate WordPress to Hugo", "WXR to Hugo", "publish WordPress to GitHub Pages", "/wjs-converting-wp-to-hugo".

wjs-dubbing-videoSkill

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".

wjs-eating-and-growingSkill

吃一堑长一智 — 走完 5 步交互式反思（堑 → 自动输出 → 旧权重 → 新参数 → 替代动作），从「情绪复盘」推进到「行为训练」，把第一反应这一层 L3 权重练新。Use when 王建硕 reflects on a personal setback, mistake, or recurring pattern (反思, 复盘, 回顾, 总结教训, 吃一堑, 长一智, "这次又栽了", "怎么又这样", "为什么我总是…", "想开点都做不到", "知道道理但做不到"). For the user as a human, not for Claude's task post-mortems.