Skill6.4k repo starsupdated today

ai-video-script

# ai-video-script Generates a structured short-video shooting script by converting a topic, style, and duration into a machine-parseable shot list with strict formatting. Each shot includes an image prompt, video prompt, voiceover, and on-screen text designed for downstream parsing by video generation tools. Use this skill when a user requests a video script, 分镜, short-video copy, or AI video production assets in formats ready for image and video generation workflows.

View source Repository: opensquilla

Install in Claude Code

Copy

git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/ai-video-script && cp -r /tmp/ai-video-script/src/opensquilla/skills/bundled/ai-video-script ~/.claude/skills/ai-video-script

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# ai-video-script — structured short-video script generator

Turns a topic/keyword + style + duration into a strict-format shooting script
the downstream `nano-banana-pro` and `seedance-2-prompt` skills can parse
without ambiguity. The default emits 3 shots; the caller may ask for 4 or 5.

## Inputs

Free-text via `with.task` / `with.request`:
- Topic / product / story
- Target audience (optional)
- Style (轻松/专业/故事/科普/带货) — narrative style, not render style
- Total duration (15s, 30s, 60s default)
- Aspect ratio (9:16 default, 16:9 optional)
- `N_SHOTS` override (5 default, **1-10 allowed**)

Caller-supplied anchors (used verbatim — this skill never invents them):
- `with.render_style` — one-line aesthetic the per-shot prompts must end
  with. Examples: `2D anime illustration, flat colour, soft cel-shading`,
  `watercolour storybook illustration`, `cinematic photoreal 35mm grain`.
  If absent / empty, emit the literal sentinel `(render style missing)`
  into the RENDER_STYLE field so downstream parsers can fail loudly.
- `with.identity_anchor` — one-line description of the main character(s)
  that every shot must reproduce byte-for-byte. Example: `Lin, a
  25-year-old East Asian woman with chin-length black bob, almond eyes,
  wearing sage-green oversized knit sweater and gold round earrings`. If
  absent / empty, emit the literal sentinel `(identity anchor missing)`
  so callers can detect the gap before spending on image/video gen.

This skill does **not** choose render style or character identity; the
orchestrator (or its user_input clarify step) does. This separation lets
the same skill serve product ads (no human) and short dramas (with
locked characters) without baked-in defaults.

## Output format (STRICT — orchestrators parse this)

Always emit exactly these top-level blocks, in this order:

```
=== OVERVIEW ===
TITLE: <one line>
DURATION_S: <int>
ASPECT_RATIO: <9:16|16:9>
STYLE: <one line>
AUDIENCE: <one line>
N_SHOTS: <int 3-5>
IDENTITY_ANCHOR: <copied verbatim from with.identity_anchor, or "(identity anchor missing)">
RENDER_STYLE: <copied verbatim from with.render_style, or "(render style missing)">

=== SHOT_1 ===
DURATION_S: <int 3-6>
CAMERA: <wide|medium|close-up + push/pull/pan/tilt/static>
IMAGE_PROMPT: <IDENTITY_ANCHOR verbatim>, <scene/action>, <RENDER_STYLE verbatim>, --ar 9:16
VIDEO_PROMPT: <IDENTITY_ANCHOR verbatim>, <one major action + camera move + duration hint>, <dialogue/voiceover/sound tags derived from VOICEOVER — see rule 11>, <RENDER_STYLE verbatim>, aspect_ratio: 9:16, no watermark, no logo, no subtitles
VOICEOVER: <one line, max 20 Chinese chars or 30 English words — kept verbatim for SRT subtitle burn-in>
ON_SCREEN_TEXT: <one short line or empty>

=== SHOT_2 ===
... (same fields, IMAGE_PROMPT and VIDEO_PROMPT must begin with the
exact same IDENTITY_ANCHOR bytes as SHOT_1)

=== SHOT_3 ===
... (same fields)
```

For any `N_SHOTS` between 1 and 10, emit exactly that many
`=== SHOT_K ===` blocks numbered 1..N_SHOTS, each with the same fields.
Do not emit shot blocks beyond `N_SHOTS`. Never skip a field; use the
literal value `none` for empty `ON_SCREEN_TEXT`.

`N_SHOTS` semantics:
- 1: a single hero shot (5-10s typical) — product/landscape vignette.
- 2-3: classic short-form story arc.
- 4-6: extended narrative with multiple beats; good for 45-60s drama.
- 7-10: stretched-form drama; total duration grows linearly with cost.

## Rules

1. **Identity continuity** — `with.identity_anchor` is pasted byte-for-byte
   at the start of every shot's IMAGE_PROMPT and VIDEO_PROMPT. Do not
   paraphrase, summarize, or pronoun-substitute it. If shot 3's anchor
   text differs by one comma from shot 1's, you wrote it wrong.
2. **Visual concreteness** — replace abstract verbs with observable action:
   "a young woman in a red trench coat walks through rain-soaked neon
   streets" >> "a woman walking".
3. **IP-safe** — do not use franchise names, character names, brand terms,
   or "style of" references. Invent original names if needed.
4. **No multi-line values** — IMAGE_PROMPT, VIDEO_PROMPT, VOICEOVER,
   ON_SCREEN_TEXT must each be a single line.
5. **Aspect ratio explicit** — every IMAGE_PROMPT ends with the literal
   token `--ar 9:16` (or `--ar 16:9`); every VIDEO_PROMPT ends with the
   literal token `aspect_ratio: 9:16` (or 16:9).
6. **Duration math** — `sum(SHOT_i.DURATION_S) == OVERVIEW.DURATION_S` ±2s.
7. **Voiceover length** — total voiceover should be speakable in
   `DURATION_S` seconds (~3 Chinese chars/sec, ~2 English words/sec).
8. **Match the user's language** — write **all** fields (TITLE, STYLE,
   AUDIENCE, IDENTITY_ANCHOR, RENDER_STYLE, IMAGE_PROMPT, VIDEO_PROMPT,
   VOICEOVER, ON_SCREEN_TEXT) in the **same language the user wrote in**.
   - The current downstream image/video models — `google/gemini-3.1-flash-image-preview`
     and `bytedance/seedance-2.0` — both accept Chinese natively.
     Seedance (ByteDance) is in fact a Chinese-first model and tends to
     produce **more on-topic results** with Chinese prompts when the
     story itself is Chinese (e.g. 咖啡店偶遇 / 国风武侠 / 校园回忆).
   - Do **not** translate the user's Chinese topic into English just to
     fill IMAGE_PROMPT — that loses cultural detail and often hallucinates
     a Western-coded substitute.
   - Mixed-language input (English topic + Chinese voiceover note,
     vice-versa) → the *bulk* of prompts follow whichever language the
     **topic/story** is in; localised fields like VOICEOVER may follow
     the language explicitly named by the user.
   - English remains valid: pick it when the user wrote in English, or
     when the user explicitly asked for English prompts.
9. **Plain text only — no emoji, no decorative symbols.** The script
   flows through Python subprocesses on Windows consoles whose default
   code page (cp936/GBK) cannot encode `✅`, `❌`, `✨`, `🎬`, or any
   non-BMP character. The orchestrator will crash with a
   `UnicodeEncodeError` if any field contains one. Use p

More from this repository

advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'

html-to-pdfSkill

Render HTML (with CSS) to a PDF file. Trigger when the user wants to export a styled report, invoice, label, or any HTML/Jinja-rendered page to PDF. Uses WeasyPrint, which supports a meaningful subset of CSS Paged Media (page size, margins, headers/footers, page-break-before/after). Optional dependency — install via `pip install opensquilla[document-extras]` or `uv add weasyprint` because WeasyPrint pulls in native libraries (Pango, Cairo, fontconfig) that need OS-level packages.