script-compose
script-compose intake and planning tool for multimodal story-to-video workflows. Triages screenplay or story input, captures scripts with duration metadata, and extracts production anchors (locations, characters, cues). Stops before shot planning or asset composition, routing completed scripts to story-to-video-workflow for sequencing recommendations and handoff to image-compose, voice-compose, or video-compose. Use when you have a script or story concept ready for video planning.
git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/script-compose && cp -r /tmp/script-compose/skills/script-compose ~/.claude/skills/script-composeSKILL.md
Run only on explicit user intent — never on a file drop. A dropped text/PDF script is already a note in `workflow.json` with `data.body` as the source text and a derived mirror at `./assets/notes/<note_id>.md`; do nothing more until they ask.
Director defaults: a 30s beat is ONE moment; let wordless action carry beats instead of packing every one with dialogue; match the user's input language.
For multi-stage story-to-video work, this skill stops at script capture, shot notes, and production anchor extraction. After that, route back to `story-to-video-workflow` for sequencing recommendations; then load `image-compose`, `voice-compose`, or `video-compose` for execution.
Script intake should leave the next agent with enough planning context, not a full production plan. Capture the target duration when it is observable:
- Explicit user duration wins ("30 seconds", "2-minute short").
- Timestamp blocks come next; sum them.
- Otherwise estimate roughly and mark it as an estimate.
Store the result on the script note metadata as `target_duration_sec` and `duration_basis` when known. If the script/story implies more than roughly 3 minutes, call out scope before shot/video planning.
## 1. Triage → Capture
Classify the input, then capture as in §2. Never skip straight to §3.
- **Screenplay** (INT./EXT. + ALL-CAPS cues + dialogue) → use **verbatim**. For a dropped text/PDF file, read the upload note body from `./workflow.json` (`data.body`) or the derived mirror at `./assets/notes/<note_id>.md`. Pick a 2–5 word title in the user's language (use the script's own if present). Identify duration basis before capture; do not rewrite to fit it.
- **Story / concept** (prose, pitch, logline) → sketch ONE paragraph back (setting, characters, conflict, target duration) and ask if it's the shape. Iterate. On "yes/go", rewrite using the rules below, then capture.
- **Neither** → don't run; defer to `image-compose` / `video-compose`.
Torn between screenplay and story? Prefer screenplay — safer than rewriting.
**Rewrite rules (story → screenplay):**
- Format: `INT./EXT. LOCATION - TIME` slug, present-tense action, ALL-CAPS cue + dialogue. No scene numbering. No camera directions (that's `video-compose`).
- Preserve any user-quoted dialogue verbatim.
- Duration: match if stated; default 30–45s. Don't overshoot.
- Short input, longer target? Keep verbatim and ask "reads as ~Ns; extend?" — don't silently pad.
## 2. Capture — canvas note + title
ONE note. No split, no further action. Shared canvas rules live in the
project `PROJECT_AGENT.md`; all writes go through the mutator, and a PreToolUse hook
blocks direct `Write` / `Edit` on `workflow.json`.
1. `read` `./workflow.json` (read-only inspection — see if `title` is already set).
2. **Append the script note** via the mutator. Stamp `subtype: "script"` so the renderer applies the script-card chrome and `video-compose` can recognise it without label parsing:
```
node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" \
--op addNode \
--payload-json '{"node":{"type":"note","data":{"subtype":"script","label":"Script: <title>","body":"<full screenplay verbatim>","metadata":{"author":"agent","timestamp":"<ISO>","target_duration_sec":45,"duration_basis":"estimated from script length"}}}}'
```
Omit `target_duration_sec` / `duration_basis` only when there is no defensible signal.
Stdout returns `assigned.node_id` — keep it for §3 (shots derive from this id).
3. **Set the workflow title if empty:**
```
node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" --op setTitle --payload-json '{"title":"<title>"}'
```
4. Confirm with `Captured.`, then offer the next step as a choice rendered per the project `PROJECT_AGENT.md` § "Recommendation and choice shape". Recommended option: "Split it into <=15s shots and extract characters/locations/voices." Plus an escape to do something else.
STOP. Do NOT proceed to §3 without an explicit user command.
## 3. Analyze — on explicit user command
**Triggers** (judge intent): "split into shots / clips", "break this up", "pull the characters / locations", "who's in this", "analyze this script", "design the characters from this script".
**Not triggers:** "what's in this", "summarize", "tell me about it" — those are read-and-reply.
When triggered:
1. **Slug** — kebab-case of the working title. Collision → suffix `-2`, `-3`.
2. **Shot splits** (≤15s each; video model caps there): read the script note's `metadata.target_duration_sec` if present; otherwise estimate before splitting. Split on natural beats (slug changes, dialogue turns, location/time changes, meaningful appearance changes). Aim for shots **as close to 15s as possible** (default ≈ `ceil(total_seconds / 15)` shots) — not rigid; sub-divide smaller when a hard cut or strong beat genuinely demands it, but don't over-fragment just because the script's own time markers say so. Pacing: ~2.5 dialogue words/sec; silent action ~3–5s. **Never rewrite when splitting** — each shot body is a verbatim slice. Each shot note carries `subtype: "shot"` so `video-compose` can locate them structurally and the canvas renders the shot-card chrome. Build ONE `addBatch` payload with N shot notes + N derived edges from the script note, and apply it in one mutator call:
```
node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" \
--op addBatch \
--payload-json '{
"nodes": [
{"type":"note","data":{"subtype":"shot","label":"Shot 1 (0–15s)","body":"<slice>","metadata":{"author":"agent","timestamp":"<ISO>"}}},
{"type":"note","data":{"subtype":"shot","label":"Shot 2 (15–30s)","body":"<slice>","metadata":{"author":"agent","timestamp":"<ISO>"}}}
],
"edges": [
{"from":"<script_note_id>","to":"$0","kind":"derived"},
{"from":"<script_note_id>","to":"$1","kind":"derived"}
]
}'
```
`$N` placeholders are 0-indexed positions in `nodes`; the mutator resolves them to the assigned ids after running. RepDesigns and maintains semantic groupings and readable layouts on the filmmaking canvas — scenes, character-reference sets, act beats, and other titled visual frames. Use when nodes on the canvas cluster around a shared meaning and would read more clearly if arranged together and wrapped in a frame. Don't force it — groups are a view concern, not an organizing tax.
>-
>-
Generates and prompts video clips on the filmmaking canvas. Use when the user asks to generate, render, animate, continue, restyle, edit, shoot, or compose a video clip; render script or shot notes as video; animate a storyboard, starting frame, image, character, location, or reference; use image, video, audio, storyboard, starting-frame, or voice refs; compose an ad, brand film, product promo, music-video shot, or video sequence; or before calling generate_video.js. Owns video CLI flags, refs, prompt construction, audio-ref handling, and video-specific failure hints.
Designs and attaches voice samples or final narration/line audio on the filmmaking canvas via the local generate_voice.js CLI. Use before calling generate_voice.js; when the user asks to give a character a voice, preview how a character sounds, create a reusable timbre anchor for video dialogue, or create exact narration/VO/final line audio.