Skill318 estrellas del repoactualizado 5d ago

script-compose

script-compose intake and planning tool for multimodal story-to-video workflows. Triages screenplay or story input, captures scripts with duration metadata, and extracts production anchors (locations, characters, cues). Stops before shot planning or asset composition, routing completed scripts to story-to-video-workflow for sequencing recommendations and handoff to image-compose, voice-compose, or video-compose. Use when you have a script or story concept ready for video planning.

Ver fuente Repositorio: pai-pro

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/script-compose && cp -r /tmp/script-compose/skills/script-compose ~/.claude/skills/script-compose

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

Run only on explicit user intent, never on file drop. Dropped text/PDF already exists as a note (`data.body`) and mirror (`./assets/notes/<note_id>.md`).

Defaults: a 30s beat is one moment; match the input language; with characters, prefer meaningful dialogue and let narration support rather than carry the scene, but let wordless action carry a beat instead of packing every one with dialogue.

Stop at script capture, shot notes, and anchor extraction. Route multi-stage work back to `story-to-video-workflow`.

Capture target duration when observable:

- Explicit user duration wins ("30 seconds", "2-minute short").
- Timestamp blocks come next; sum them.
- Otherwise estimate roughly and mark it as an estimate.

Store `target_duration_sec` and `duration_basis` when known. If implied runtime is >~3 minutes, call out scope before shot/video planning.

## 1. Triage → Capture

Classify the input, then capture as in §2. Never skip straight to §3.

- **Screenplay** (INT./EXT. + ALL-CAPS cues + dialogue) → use **verbatim**. For dropped text/PDF, read `workflow.json` `data.body` or `./assets/notes/<note_id>.md`. Pick a 2–5 word title; identify duration basis; do not rewrite to fit.
- **Story / concept** (prose, pitch, logline) → sketch ONE paragraph back (setting, characters, conflict, target duration) and ask if it's the shape. Iterate. On "yes/go", rewrite using the rules below, then capture.
- **Neither** → don't run; defer to `image-compose` / `video-compose`.

Torn between screenplay and story? Prefer screenplay — safer than rewriting.

**Rewrite rules (story → screenplay):**
- Format: `INT./EXT. LOCATION - TIME` slug, present-tense action, ALL-CAPS cue + dialogue. No scene numbering. No camera directions (that's `video-compose`).
- Preserve user-quoted dialogue verbatim.
- With characters, include dialogue that reveals motive/conflict/relationship; avoid narration-only exposition unless VO-driven.
- Pace speech at ~2.2-2.5 words/sec plus reaction/action room.
- Duration: match if stated; default 30–45s. Don't overshoot.
- Short input, longer target? Keep verbatim and ask "reads as ~Ns; extend?" — don't silently pad.

## 2. Capture — canvas note + title

ONE note. No split. Canvas writes go through the mutator, never direct `workflow.json`.

1. `read` `./workflow.json` (read-only inspection — see if `title` is already set).
2. **Append the script note** via the mutator with `subtype: "script"`:
   ```
   node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" \
     --op addNode \
     --payload-json '{"node":{"type":"note","data":{"subtype":"script","label":"Script: <title>","body":"<full screenplay verbatim>","metadata":{"author":"agent","timestamp":"<ISO>","target_duration_sec":45,"duration_basis":"estimated from script length"}}}}'
   ```
   Omit `target_duration_sec` / `duration_basis` only when there is no defensible signal.
   Stdout returns `assigned.node_id` — keep it for §3 (shots derive from this id).
3. **Set the workflow title if empty:**
   ```
   node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" --op setTitle --payload-json '{"title":"<title>"}'
   ```
4. Confirm with `Captured.`, then offer the next step as a choice rendered per the project `PROJECT_AGENT.md` § "Recommendation and choice shape". Recommended option: "Split it into <=15s shots and extract characters/locations/voices." Plus an escape to do something else.

STOP. Do NOT proceed to §3 without an explicit user command.

## 3. Analyze — on explicit user command

**Triggers** (judge intent): "split into shots / clips", "break this up", "pull the characters / locations", "who's in this", "analyze this script", "design the characters from this script".
**Not triggers:** "what's in this", "summarize", "tell me about it" — those are read-and-reply.

When triggered:

1. **Slug** — kebab-case of the working title. Collision → suffix `-2`, `-3`.
2. **Shot splits** (≤15s each): use `metadata.target_duration_sec` or estimate. Split on natural beats (slug/dialogue/location/time/appearance changes). For >15s material, keep resulting shots as close to 15s as natural (default ≈ `ceil(total_seconds / 15)` shots); split shorter only for hard cuts, dialogue turns, continuity shifts, or strong beats — don't over-fragment just because the script's own time markers say so. Pace speech at ~2.2-2.5 words/sec plus reaction/action room; silent action ~3–5s. If dialogue cannot fit naturally, split it; reduce only when the user asked for compression. **Never rewrite** — shot bodies are verbatim slices. Each shot note has `subtype: "shot"`. Build one `addBatch` with N shot notes + N derived edges:
   ```
   node "$PAI_REPO_ROOT/server/cli/canvas_mutate.js" \
     --op addBatch \
     --payload-json '{
       "nodes": [
         {"type":"note","data":{"subtype":"shot","label":"Shot 1 (0–15s)","body":"<slice>","metadata":{"author":"agent","timestamp":"<ISO>"}}},
         {"type":"note","data":{"subtype":"shot","label":"Shot 2 (15–30s)","body":"<slice>","metadata":{"author":"agent","timestamp":"<ISO>"}}}
       ],
       "edges": [
         {"from":"<script_note_id>","to":"$0","kind":"derived"},
         {"from":"<script_note_id>","to":"$1","kind":"derived"}
       ]
     }'
   ```
   `$N` placeholders are 0-indexed positions in `nodes`; the mutator resolves them to the assigned ids after running. Reply's `assigned.node_ids` is the array of shot ids in the same order.
3. **Anchor extraction** — from the shot bodies, extract only downstream needs:
   - **Characters**: recurring/visually important people/entities. Include one-line base visuals only when given.
   - **Variants**: same character with materially different on-screen look by scene/shot: age jump, costume change, injury, disguise, transformation, wet/dirty/bloodied state if it must persist across shots. Do not create variants for transient expressions or tiny props.
   - **Locations**: distinct settings plus same-setting variants when framing/scale, time, weather, lighting, dressing, story st

Del mismo repositorio

groups-composeSkill

Designs and maintains semantic groupings and readable layouts on the filmmaking canvas — scenes, character-reference sets, act beats, and other titled visual frames. Use when nodes on the canvas cluster around a shared meaning and would read more clearly if arranged together and wrapped in a frame. Don't force it — groups are a view concern, not an organizing tax.

image-composeSkill

story-to-video-workflowSkill

video-composeSkill

Generates and prompts video clips on the filmmaking canvas. Use when the user asks to generate, render, animate, continue, restyle, edit, shoot, or compose a video clip; render script or shot notes as video; animate a storyboard, starting frame, image, character, location, or reference; use image, video, audio, storyboard, starting-frame, or voice refs; compose an ad, brand film, product promo, music-video shot, or video sequence; or before calling generate_video.js. Owns video CLI flags, refs, prompt construction, audio-ref handling, and video-specific failure hints.

voice-composeSkill

Designs and attaches voice samples or final narration/line audio on the filmmaking canvas via the local generate_voice.js CLI. Use before calling generate_voice.js; when the user asks to give a character a voice, preview how a character sounds, create reusable timbre anchors for every speaking character or VO/narration, or create exact narration/VO/final line audio.