Skip to main content
ClaudeWave
Skill4.1k repo starsupdated today

meta-short-drama

**meta-short-drama** generates multi-shot video dramas (1-10 shots, default 5) from a topic by extracting render style and character details from the user's request, drafting a shot-by-shot script, pausing for free-form review and optional revision before image and video generation, then compositing title and ending cards with in-language burned subtitles into a final MP4 while saving the script. Use it only when a user explicitly asks to create a short-drama or 短剧 video; do not use it for slide decks, single images, isolated scripts, or analysis tasks.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/meta-short-drama && cp -r /tmp/meta-short-drama/src/opensquilla/skills/bundled/meta-short-drama ~/.claude/skills/meta-short-drama
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# meta-short-drama

End-to-end short-drama generator with one free-form user-review gate
before any paid step. **1-10 shots** (default 5), title card + ending
card, in-language burned subtitles, and the generated script is saved
to disk regardless of outcome.

## What it does

1. **`intake_extract`** scans the user message for RENDER_STYLE,
   IDENTITY_ANCHOR, and N_SHOTS (1-10). Fills in defaults when missing.
2. **`script_draft`** calls `ai-video-script` with the inferred values
   pasted verbatim into every shot prompt.
3. **`review_gate`** — single free-form pause. The user can approve,
   rewrite render style / character / shot count / shot details, or
   cancel in plain language.
4. **`review_normalize`** parses the free-form reply.
5. **`script_revised`** (conditional) redrafts when overrides present.
6. **`final_script`** echoes the canonical script.
7. **`script_save`** writes `script.txt` to the run folder
   (always — even on cancel, so the user keeps the draft).
8. **`title_extract` / `subtitle_extract` / `ending_text_extract`**
   pull cover/ending text in the script's language.
9. **`cover_image` + `cover_video`** — Pillow title card + 2s Ken-Burns
   clip (`0_cover.mp4` — sorts first in merge).
10. **Per-shot extracts × 10** — for shots 1..10 the LLM emits either
    the real prompts/duration OR the literal sentinel `__SHOT_ABSENT__`.
    Image/video steps gate on the sentinel so unused slots stay dormant.
11. **Image generation per active shot** — `nano-banana-pro`, retry +
    fallback model + placeholder PNG (image step never aborts DAG).
12. **`reference_prompt_extract` + `reference_image`** — one extra
    `nano-banana-pro` call produces `reference.png`, a full-cast neutral
    lineup of every named character on a neutral backdrop. Used as the
    universal IDENTITY anchor for every shot's seedance call so the
    character does not drift across cuts (nano-banana would otherwise
    re-roll subtly different faces per shot).
13. **Video generation per active shot** — `seedance-2.0`, retry twice;
    on persistent refusal the Ken-Burns substitute fires using the
    shot's PNG. Each shot passes TWO reference images to seedance,
    AND the per-shot prompt is wrapped with an explicit "Assets
    Mapping" preamble in the upstream JiMeng convention so seedance
    knows the role of each reference:
      reference[1] = `reference.png` (full-cast identity anchor — used
                     strictly for character likeness / faces / hair /
                     outfits / accessories across all shots)
      reference[2] = `N_shot.png`    (this shot's scene composition
                     reference — used for camera angle, framing,
                     blocking, prop placement, background layout)
    The Assets Mapping preamble is in English even when the per-shot
    directive is Chinese — seedance parses English instruction prefixes
    reliably regardless of the user-content language. Empty / missing
    references are still filtered before the API call (so direct CLI
    callers using a single anchor remain backwards-compatible).
13. **`ending_image` + `ending_video`** — Pillow "完" / "THE END" card
    + 1.5s Ken-Burns clip (`99_ending.mp4` — sorts last).
14. **`merge`** — `video-merger` stitches `0_cover` + active shots
    + `99_ending` via numeric-prefix sort. ffmpeg cross-fade transitions.
15. **`subtitles_srt`** — SRT cues from VOICEOVER per shot, shifted by
    the 2-second cover duration so cue timing matches the merged
    timeline.
16. **`subtitled_final`** — `subtitle-burner` burns the SRT into
    `final_subtitled.mp4`.
17. **`deliver`** — always runs, branches on DECISION. Lists the saved
    script path so the user keeps a copy regardless.

## Outputs

```
<workspace>/meta_short_drama/<slug>/
    script.txt              # full final script (always)
    reference.png           # full-cast identity reference (used by every shot_video)
    0_cover.png  0_cover.mp4
    1_shot.png   1_shot.mp4   ┐
    2_shot.png   2_shot.mp4   ├ only for active shots (1..N_SHOTS)
    ...                       ┘
    99_ending.png 99_ending.mp4
    subs.srt
    final.mp4               # merged, no subtitles
    final_subtitled.mp4     # subtitled — the deliverable
```

## Dependencies

| Skill | Purpose | Models / Tools |
|---|---|---|
| `ai-video-script` | Structured shot list (1-10 shots) | LLM |
| `nano-banana-pro` | Per-shot first-frame PNG | OpenRouter Gemini 3.1 / 3 pro |
| `seedance-2-prompt` | Per-shot MP4 | OpenRouter Seedance 2.0 (or Volcengine ARK) |
| `video-still-animator` | Ken-Burns fallback / cover & ending clips | ffmpeg ≥ 5.0 |
| `video-merger` | Stitch cover + shots + ending | ffmpeg ≥ 5.0 |
| `srt-from-script` | VOICEOVER → SRT with cover offset | Python stdlib |
| `subtitle-burner` | Burn SRT into MP4 | ffmpeg + libass |
| `title-card-image` | Pillow cover + ending PNG cards | Pillow |
| (builtin) `write_file` | Save script.txt (no skill needed) | OpenSquilla builtin |
| `text-file-read` | Re-read script.txt after review pause | Python stdlib |

Environment:
- `OPENROUTER_API_KEY` must be set.
- `ffmpeg` and `ffprobe` on PATH.
- Pillow installed (already in opensquilla deps).

## Risk

`high` — writes files, spends real OpenRouter credits, runs ffmpeg
subprocesses. The review_gate ensures user consent before any paid step.

## Limits (v2)

- 1-10 shots; default 5. The DAG always declares 10 slots but
  `__SHOT_ABSENT__` gating keeps unused slots dormant.
- Per-shot duration follows the script's DURATION_S (clamped 3-15s by
  seedance API). Total drama length scales linearly.
- 9:16 portrait.
- Per-shot seedance failures fall back to Ken-Burns. Image step
  has its own placeholder fallback. Prompt-extract llm_chats still
  abort the run if they return malformed output.
- Concurrent runs with identical user_message collide on the same
  slug-derived subdir.

## When NOT to use

- Single image / single clip / script-only / stitch-only — use the
  underlying skills di
advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

ai-video-scriptSkill

Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'