story-to-video-workflow
The story-to-video workflow skill orchestrates the conversion of scripts or narratives into video by sequencing a pipeline of specialized capabilities: script composition, character and location anchoring, voice generation, video rendering, and timeline assembly. Use this when converting written content into finished video productions, as it ensures proper dependency ordering and user consent gates before initiating expensive rendering steps.
git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/story-to-video-workflow && cp -r /tmp/story-to-video-workflow/skills/story-to-video-workflow ~/.claude/skills/story-to-video-workflowSKILL.md
# Story-to-video workflow
## Contract
- This skill wakes first for story/script/promo-to-video work.
- Sequence the pipeline, but do not call `generate_*` directly from this skill.
- Before any execution step, load the matching capability skill for that domain.
- Recommendations are planning, not consent. Ask and stop when the next step costs money or changes the pipeline.
## Default arc
Use this as the normal story-to-video ladder. It is a guide, not a lock; the user can skip, reorder, supply refs, or ask for a rough direct render.
1. Capture or adapt the story/script.
2. Split into <=15s shot notes and identify production anchors.
3. Create character and location anchors for video-bound shots, unless the user supplied refs or explicitly chose rough direct render.
4. Create voices for speaking characters or narration when voice matters.
5. Confirm the working clip plan: shot count, durations, continuity needs, and first missing dependency.
6. Ask render path: straight to video vs storyboard first.
7. Ask dispatch for multi-clip plans: hybrid, parallel, or sequential.
8. Render video clips.
9. Hand off clip order and preview to the Timeline flow.
Plan ahead internally, but only ask the next meaningful user-facing choice; the Consent and gates ladder fixes when render path and dispatch become askable.
## Skill routing
| Need | Load next |
|---|---|
| Script capture, rewrite, split, or analysis | `script-compose` |
| Character, location, storyboard, starting frame, or visual anchor | `image-compose` |
| Narration, dialogue read, character voice, or audio node | `voice-compose` |
| Clip render, continuation, audio refs, storyboard animation, or video prompt | `video-compose` |
| Scene/ref grouping or canvas layout frames | `groups-compose` |
Capability skills own CLI flags, node grammar, reference flags, and domain-specific recovery hints. `PROJECT_AGENT.md` owns the shared failure taxonomy. This workflow owns sequencing and handoff only.
## Consent and gates
- A recommended option is not consent by itself; wait for the user to answer.
- Paid video generation needs explicit user intent before staging.
- Draft-only, failed, and cancelled generations do not advance the story pipeline.
- Render path and multi-clip dispatch are later choices when the story shape is meaningful enough to decide them.
- If the user asks for a one-off generation outside the story pipeline, route directly to the matching capability skill.
- These are soft gates, not bureaucracy. If the user explicitly asks to skip anchors, storyboards, or planning and make a rough direct render, honor that choice and carry it forward.
- Gating ladder. Ask each rung only once the prior one is real, never off a rough beat plan: script captured, then <=15s shot notes -> anchors, user refs, or an explicit rough-direct skip -> a clip plan real enough to discuss (shot count, durations, continuity) -> render path (full askability in the Render path section) -> dispatch (multi-clip plan only). Stop after the render-path question; surface dispatch only in a later turn unless the user's reply already names a combined choice such as "straight to video + parallel".
## VO and dialogue invariants
- Spoken words live on script/shot notes and `audio_result.data.text`.
- `voice-compose` owns generating or preserving the exact spoken text.
- `audio_result.data.text` is the exact speech source of truth after voice generation.
- `video-compose` includes spoken text verbatim; Pattern 6 distinguishes final reads from timbre anchors.
## Recommendation shape
Follow the project `PROJECT_AGENT.md` § "Recommendation and choice shape". Recommend one concrete next step. Add a second option only when there is a real tradeoff.
## Planning checkpoint
Before recommending refs or video from a story, inspect `workflow.json` when needed and summarize only the decision-relevant state:
- Target duration from user duration, timestamps, or a rough estimate.
- Planned shot count, with each shot intended as <=15s.
- Characters, material variants, locations, and speaking/narration needs.
- First missing anchor blocking the next clip.
If the story implies more than roughly 3 minutes, recommend narrowing scope before clip planning.
After shot notes exist, if video-bound character/location anchors are missing, recommend anchors as the default next step. Include a rough-direct skip option when speed matters.
After anchors are present, offer a lightweight reference review or clip-plan confirmation before render choices when the next step is still ambiguous. Keep it short. For simple single-clip projects, user-supplied refs, or an explicit rough-direct choice, keep the checkpoint small and move on.
## Render path
Ask this only after the script/shot plan is settled and either:
- video-bound character/location anchors are present,
- the user explicitly chose to skip anchors for a rough direct render,
- the user supplied usable refs, or
- the project is a simple one-off/single-clip render where anchors are not useful.
If shot notes exist but anchors are still missing and the user has not chosen rough direct render, return to the Planning checkpoint instead.
When ready and the user has not picked a path, ask:
Use the project manual's choice shape with:
- header: `Render`
- question: `Choose render path.`
- options:
- label: `Straight to video (Recommended)`
description: `Fastest path to motion.`
- label: `Storyboard first`
description: `Generate storyboard images first for composition control.`
For storyboard-first, load `image-compose` Pattern 6. Generate one composite mosaic per clip or <=15s shot note, not one image per panel; each mosaic should be an `image_result` with `subtype: "storyboard"`.
## Dispatch for multiple clips
Last rung of the Consent and gates ladder: render path picked and a multi-clip plan exists. Skip for one clip.
Use the project manual's choice shape with:
- header: `Dispatch`
- question: `Choose clip dispatch.`
-Designs and maintains semantic groupings and readable layouts on the filmmaking canvas — scenes, character-reference sets, act beats, and other titled visual frames. Use when nodes on the canvas cluster around a shared meaning and would read more clearly if arranged together and wrapped in a frame. Don't force it — groups are a view concern, not an organizing tax.
>-
>-
Generates and prompts video clips on the filmmaking canvas. Use when the user asks to generate, render, animate, continue, restyle, edit, shoot, or compose a video clip; render script or shot notes as video; animate a storyboard, starting frame, image, character, location, or reference; use image, video, audio, storyboard, starting-frame, or voice refs; compose an ad, brand film, product promo, music-video shot, or video sequence; or before calling generate_video.js. Owns video CLI flags, refs, prompt construction, audio-ref handling, and video-specific failure hints.
Designs and attaches voice samples or final narration/line audio on the filmmaking canvas via the local generate_voice.js CLI. Use before calling generate_voice.js; when the user asks to give a character a voice, preview how a character sounds, create a reusable timbre anchor for video dialogue, or create exact narration/VO/final line audio.