Skill318 estrellas del repoactualizado 5d ago

story-to-video-workflow

The story-to-video workflow skill orchestrates the conversion of scripts or narratives into video by sequencing a pipeline of specialized capabilities: script composition, character and location anchoring, voice generation, video rendering, and timeline assembly. Use this when converting written content into finished video productions, as it ensures proper dependency ordering and user consent gates before initiating expensive rendering steps.

Ver fuente Repositorio: pai-pro

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/story-to-video-workflow && cp -r /tmp/story-to-video-workflow/skills/story-to-video-workflow ~/.claude/skills/story-to-video-workflow

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Story-to-video workflow

## Contract

- This skill wakes first for story/script/promo-to-video work.
- Own sequencing only. Before execution, load the matching capability skill; do not call `generate_*` here.
- Recommendations are not consent. Stop before paid generation or pipeline changes unless the user explicitly approved an autonomous workflow.

## Default arc

Use this ladder unless the user skips, reorders, supplies refs, or asks for a rough direct render:

1. Clarify only blockers.
2. Raw idea/story -> `script-compose` production script; existing screenplay -> capture/adapt.
3. `script-compose` splits <=15s dialogue-aware shots and extracts characters, material variants, detailed locations/location variants, and speaker/VO needs.
4. `image-compose` creates useful visual anchors: base/variant character sheets and detailed location/detail anchors.
5. `voice-compose` creates reusable anchors for every speaker and VO/narrator.
6. Confirm shot count, durations, continuity needs, and first blocker.
7. Default render path: straight-to-video from refs. Storyboard only if requested, hard to control, or needed for diagnosis.
8. Default dispatch: hybrid. Chain continuous dependent shots; render independent scenes/shots in parallel.
9. Render clips, assign Timeline `shot_id` when sequence order is unambiguous, then hand off to Timeline.

Plan ahead internally, but only ask the next meaningful user-facing choice; the Consent and gates ladder fixes when render path and dispatch become askable.

## Skill routing

| Need | Load next |
|---|---|
| Script capture, rewrite, split, or analysis | `script-compose` |
| Character, location, storyboard, starting frame, or visual anchor | `image-compose` |
| Narration, dialogue read, character voice, or audio node | `voice-compose` |
| Clip render, continuation, audio refs, storyboard animation, or video prompt | `video-compose` |
| Scene/ref grouping or canvas layout frames | `groups-compose` |

Capability skills own CLI flags, node grammar, refs, and recovery hints. `PROJECT_AGENT.md` owns shared failure handling.

## Consent and gates

- Draft-only, failed, and cancelled generations do not advance the pipeline.
- One-off generation outside the story pipeline routes directly to the capability skill.
- Honor explicit rough-direct/skip choices.
- Gate ladder: script -> shot notes -> anchors/user refs/rough-direct -> real clip plan -> render path -> dispatch. Ask each rung only after the prior one is real; stop after render-path unless the user already names dispatch too.

## VO and dialogue invariants

- Script/shot notes carry dialogue/VO until final audio exists.
- `audio_result.data.text` is source of truth only for approved final narration/line reads.
- `video-compose` includes spoken text verbatim and treats voice samples as timbre anchors.

## Recommendation shape

Follow the project `PROJECT_AGENT.md` § "Recommendation and choice shape". Recommend one concrete next step. Add a second option only when there is a real tradeoff.

## Planning checkpoint

Before recommending refs/video, inspect `workflow.json` when needed and summarize only:

- Target duration from user duration, timestamps, or a rough estimate.
- Planned <=15s shot count.
- Characters, material variants, detailed locations/location variants, close/detail needs, speakers/VO.
- First missing anchor blocking the next clip.

If the story implies more than roughly 3 minutes, recommend narrowing scope before clip planning.

After shot notes, missing video-bound character/location/voice anchors are the default next step; include a rough-direct skip when speed matters. Once anchors/user refs/rough-direct are settled, offer only a short ref review or clip-plan confirmation if ambiguity remains.

## Render path

Ask only after the script/shot plan is settled and anchors, usable refs, rough-direct, or a simple single-clip case make rendering real. If anchors are still missing, return to Planning checkpoint.

Use project choice shape:

- header: `Render`
- question: `Choose render path.`
- options:
- label: `Straight to video (Recommended)`
description: `Fastest path to motion.`
- label: `Storyboard first`
description: `Generate storyboard images first for composition control.`

For storyboard-first, load `image-compose` Pattern 6: one composite mosaic per clip/<=15s shot note, subtype `storyboard`.

## Dispatch for multiple clips

Ask only after render path is picked and a multi-clip plan exists. Skip for one clip. Use project choice shape:

- header: `Dispatch`
- question: `Choose clip dispatch.`
- options: order these by the observable story signals below; suffix the first label with `(Recommended)`.
- label: `Hybrid`
description: `Chain within continuous scenes; render separate scenes independently.`
- label: `Parallel`
description: `Render all clips independently.`
- label: `Sequential`
description: `Each clip continues from the previous one; boundaries default to a hard cut to a new angle (avoids the same-shot seam) — keep a boundary same-shot only for an unbroken oner.`

Signals: continuous scene/state -> sequential (hard-cut handoffs between clips); a single unbroken action the viewer must read as ONE motion -> one ≤15s clip, else sequential with a same-shot handoff; separate scenes/time jumps/wardrobe changes/montage -> parallel; continuous clusters separated by hard cuts -> hybrid. Do not chain video refs across location, time, wardrobe/state, dream/reality, or montage breaks.

## After media results

After terminal `generate_*`:

1. If it is only draft-stage JSON, report the price/status and stop.
2. If `ok:false`, follow project failure handling and do not advance the pipeline.
3. If `ok:true`, identify the landed node id from the result or canvas state.
4. Read `workflow.json` if shots, refs, voices, clips, or reel order affect the next decision.
5. Recommend exactly one next useful filmmaking move.

Typical priority:

- Script note landed -> recommend

Del mismo repositorio

groups-composeSkill

Designs and maintains semantic groupings and readable layouts on the filmmaking canvas — scenes, character-reference sets, act beats, and other titled visual frames. Use when nodes on the canvas cluster around a shared meaning and would read more clearly if arranged together and wrapped in a frame. Don't force it — groups are a view concern, not an organizing tax.

image-composeSkill

script-composeSkill

video-composeSkill

Generates and prompts video clips on the filmmaking canvas. Use when the user asks to generate, render, animate, continue, restyle, edit, shoot, or compose a video clip; render script or shot notes as video; animate a storyboard, starting frame, image, character, location, or reference; use image, video, audio, storyboard, starting-frame, or voice refs; compose an ad, brand film, product promo, music-video shot, or video sequence; or before calling generate_video.js. Owns video CLI flags, refs, prompt construction, audio-ref handling, and video-specific failure hints.

voice-composeSkill

Designs and attaches voice samples or final narration/line audio on the filmmaking canvas via the local generate_voice.js CLI. Use before calling generate_voice.js; when the user asks to give a character a voice, preview how a character sounds, create reusable timbre anchors for every speaking character or VO/narration, or create exact narration/VO/final line audio.