meta-short-drama
**meta-short-drama** generates multi-shot video dramas (1-10 shots, default 5) from a topic by extracting render style and character details from the user's request, drafting a shot-by-shot script, pausing for free-form review and optional revision before image and video generation, then compositing title and ending cards with in-language burned subtitles into a final MP4 while saving the script. Use it only when a user explicitly asks to create a short-drama or 短剧 video; do not use it for slide decks, single images, isolated scripts, or analysis tasks.
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/meta-short-drama && cp -r /tmp/meta-short-drama/src/opensquilla/skills/bundled/meta-short-drama ~/.claude/skills/meta-short-dramaSKILL.md
# meta-short-drama
End-to-end short-drama generator with one free-form user-review gate
before any paid step. **1-10 shots** (default 5), title card + ending
card, in-language burned subtitles, and the generated script is saved
to disk regardless of outcome.
## What it does
1. **`intake_extract`** scans the user message for RENDER_STYLE,
IDENTITY_ANCHOR, and N_SHOTS (1-10). Fills in defaults when missing.
2. **`script_draft`** calls `ai-video-script` with the inferred values
pasted verbatim into every shot prompt.
3. **`review_gate`** — single free-form pause. The user can approve,
rewrite render style / character / shot count / shot details, or
cancel in plain language.
4. **`review_normalize`** parses the free-form reply.
5. **`script_revised`** (conditional) redrafts when overrides present.
6. **`final_script`** echoes the canonical script.
7. **`script_save`** writes `script.txt` to the run folder
(always — even on cancel, so the user keeps the draft).
8. **`title_extract` / `subtitle_extract` / `ending_text_extract`**
pull cover/ending text in the script's language.
9. **`cover_image` + `cover_video`** — Pillow title card + 2s Ken-Burns
clip (`0_cover.mp4` — sorts first in merge).
10. **Per-shot extracts × 10** — for shots 1..10 the LLM emits either
the real prompts/duration OR the literal sentinel `__SHOT_ABSENT__`.
Image/video steps gate on the sentinel so unused slots stay dormant.
11. **Image generation per active shot** — `nano-banana-pro`, retry +
fallback model + placeholder PNG (image step never aborts DAG).
12. **`reference_prompt_extract` + `reference_image`** — one extra
`nano-banana-pro` call produces `reference.png`, a full-cast neutral
lineup of every named character on a neutral backdrop. Used as the
universal IDENTITY anchor for every shot's seedance call so the
character does not drift across cuts (nano-banana would otherwise
re-roll subtly different faces per shot).
13. **Video generation per active shot** — `seedance-2.0`, retry twice;
on persistent refusal the Ken-Burns substitute fires using the
shot's PNG. Each shot passes TWO reference images to seedance,
AND the per-shot prompt is wrapped with an explicit "Assets
Mapping" preamble in the upstream JiMeng convention so seedance
knows the role of each reference:
reference[1] = `reference.png` (full-cast identity anchor — used
strictly for character likeness / faces / hair /
outfits / accessories across all shots)
reference[2] = `N_shot.png` (this shot's scene composition
reference — used for camera angle, framing,
blocking, prop placement, background layout)
The Assets Mapping preamble is in English even when the per-shot
directive is Chinese — seedance parses English instruction prefixes
reliably regardless of the user-content language. Empty / missing
references are still filtered before the API call (so direct CLI
callers using a single anchor remain backwards-compatible).
13. **`ending_image` + `ending_video`** — Pillow "完" / "THE END" card
+ 1.5s Ken-Burns clip (`99_ending.mp4` — sorts last).
14. **`merge`** — `video-merger` stitches `0_cover` + active shots
+ `99_ending` via numeric-prefix sort. ffmpeg cross-fade transitions.
15. **`subtitles_srt`** — SRT cues from VOICEOVER per shot, shifted by
the 2-second cover duration so cue timing matches the merged
timeline.
16. **`subtitled_final`** — `subtitle-burner` burns the SRT into
`final_subtitled.mp4`.
17. **`deliver`** — always runs, branches on DECISION. Lists the saved
script path so the user keeps a copy regardless.
## Outputs
```
<workspace>/meta_short_drama/<slug>/
script.txt # full final script (always)
reference.png # full-cast identity reference (used by every shot_video)
0_cover.png 0_cover.mp4
1_shot.png 1_shot.mp4 ┐
2_shot.png 2_shot.mp4 ├ only for active shots (1..N_SHOTS)
... ┘
99_ending.png 99_ending.mp4
subs.srt
final.mp4 # merged, no subtitles
final_subtitled.mp4 # subtitled — the deliverable
```
## Dependencies
| Skill | Purpose | Models / Tools |
|---|---|---|
| `ai-video-script` | Structured shot list (1-10 shots) | LLM |
| `nano-banana-pro` | Per-shot first-frame PNG | OpenRouter Gemini 3.1 / 3 pro |
| `seedance-2-prompt` | Per-shot MP4 | OpenRouter Seedance 2.0 (or Volcengine ARK) |
| `video-still-animator` | Ken-Burns fallback / cover & ending clips | ffmpeg ≥ 5.0 |
| `video-merger` | Stitch cover + shots + ending | ffmpeg ≥ 5.0 |
| `srt-from-script` | VOICEOVER → SRT with cover offset | Python stdlib |
| `subtitle-burner` | Burn SRT into MP4 | ffmpeg + libass |
| `title-card-image` | Pillow cover + ending PNG cards | Pillow |
| (builtin) `write_file` | Save script.txt (no skill needed) | OpenSquilla builtin |
| `text-file-read` | Re-read script.txt after review pause | Python stdlib |
Environment:
- `OPENROUTER_API_KEY` must be set.
- `ffmpeg` and `ffprobe` on PATH.
- Pillow installed (already in opensquilla deps).
## Risk
`high` — writes files, spends real OpenRouter credits, runs ffmpeg
subprocesses. The review_gate ensures user consent before any paid step.
## Limits (v2)
- 1-10 shots; default 5. The DAG always declares 10 slots but
`__SHOT_ABSENT__` gating keeps unused slots dormant.
- Per-shot duration follows the script's DURATION_S (clamped 3-15s by
seedance API). Total drama length scales linearly.
- 9:16 portrait.
- Per-shot seedance failures fall back to Ken-Burns. Image step
has its own placeholder fallback. Prompt-extract llm_chats still
abort the run if they return malformed output.
- Concurrent runs with identical user_message collide on the same
slug-derived subdir.
## When NOT to use
- Single image / single clip / script-only / stitch-only — use the
underlying skills diSubmit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.
Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.
Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.
Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.
Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.
GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.
Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'