Skill318 repo starsupdated 4d ago

video-compose

The video-compose skill dispatches video generation requests by constructing prompts, managing references, and invoking the generate_video CLI with appropriate flags. Use it when the user requests video creation, animation, editing, or composition, including tasks like rendering scripts as video, animating storyboards, applying audio or image references, or generating ads and product promos. The skill enforces production defaults like staging all outputs and enabling audio by default unless explicitly disabled.

View source Repository: pai-pro

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/video-compose && cp -r /tmp/video-compose/skills/video-compose ~/.claude/skills/video-compose

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

Intent dispatcher. Patterns name trigger, call, edges, and prompt reference.

## Hard defaults

- Stage by default per `PROJECT_AGENT.md`.
- Audio on by default; pass `--no-audio` only for explicit silent/no-audio requests. Trailer/portrait/cinematic framing is NOT a trigger; audio is the baseline, not optional polish.
- Reference-to-clip default: use available character/variant/location/voice refs directly. Storyboard only if requested, hard to control, or needed for diagnosis.
- Preserve scripted dialogue/VO exactly unless the user asks for rewrite.

## First-use video mode

For the ask-once flow and per-mode prices, see the project `PROJECT_AGENT.md` § "First-use generation choices". Pass `--resolution` only for `480p Draft` or `1080p Final`.

## CLI shape

```
node "$PAI_REPO_ROOT/server/cli/generate_video.js" --prompt "..." [--duration <seconds>] [--aspect-ratio 16:9]
  [--resolution <480p|1080p>] [--no-audio]
  [--label "..."] [--ref-source-id <id> ...] [--ref-audio-source-id <audio_id> ...]
  [--source-node-id <id>] [--shot-id <N>]
```

Calls go via `--stage` — see the project `PROJECT_AGENT.md` § "Draft gate".

`--label` defaults to truncated prompt. Use `--ref-source-id` for image/video refs, `--ref-audio-source-id` for audio refs, and `--source-node-id` for the authoring note. Mirror external URLs first. Do not set `--shot-id` during speculative/partial generation unless user asks for a reel position; story sequences assign Timeline order after planned clips land.

Match stated single-clip duration with `--duration`; omit for 15s default. Split or chain >15s totals.

Each clip costs real money even after staging — only stage after the user has explicitly asked for a video.

## Reference caps (video-generation)

≤9 image refs, ≤3 audio refs, ≤3 video refs. Audio/video refs must be **1.8s-15.2s each**; video refs also cap at **15s aggregate**. Audio refs need image/video anchor. Read durations from `workflow.json`; on failure, use returned `limits` + `sent`.

## Reference roles — vocabulary

Prompt wording binds each ref role:

| Role | Flag | Wording in prompt |
|---|---|---|
| Character identity | `--ref-source-id` (image) | "the character in @Image1" |
| Location / setting | `--ref-source-id` (image) | "the location shown in @Image1" |
| Opening frame | `--ref-source-id` (image) | "opening frame @Image1, …" |
| Closing frame | `--ref-source-id` (image) | "closing on the frame from @Image1" |
| Source clip — continue (next clip in a chain) | `--ref-source-id` (video) | **Default = hard cut:** "Hard cut from @Video1: open on a NEW camera angle; do not match its final frame." Same-shot ("Continue from @Video1 … maintain camera position") only for an authored held beat / oner / explicit user request — see [`references/video-extension.md`](references/video-extension.md) |
| Source clip — transform | `--ref-source-id` (video) | "Re-render @Video1 in …" |
| Camera-move source | `--ref-source-id` (video) | "camera moves match @Video1" |
| Action source | `--ref-source-id` (video) | "action choreography matches @Video1" |
| VFX template | `--ref-source-id` (video) | "use the visual-effects template from @Video1" |
| Voice / timbre anchor | `--ref-audio-source-id` | "Use @Audio1 as voice/timbre reference. Speak once, no echo." |

## Prompt-language conventions

- Ref syntax: `@Image1` / `@Video1` / `@Audio1`, positional by flag order. Every `@ImageN`/`@VideoN`/`@AudioN` MUST have a matching `--ref-source-id`/`--ref-audio-source-id` flag — the CLI rejects a mismatch (`bad_args`) before generating. Mentioning the same ref many times is fine; only the highest index per kind needs a flag.
- Spoken text: include script/shot/user dialogue/VO verbatim; do not summarize, translate, shorten, polish, or invent.
- Dialogue scenes: keep the shot/script dialogue in the prompt; use one approved voice sample per speaker as a timbre anchor. Bind each quoted line to the intended character and the matching `@AudioN` reference. Do not generate per-line audio refs unless the user explicitly wants separate final audio.
- Final audio exception: if an audio node is the approved narration/line read, use `audio_result.data.text` verbatim. If it is just a character voice sample, do not replace the shot dialogue with the sample text.
- Add dialogue guards for model-spoken lines: *"each line spoken exactly once, no echo, no repeated reads."* Add phonetic spelling for names or words likely to slur.
- One camera move, one action speed, concrete sound/music (`No Music` if none). Use exact terms: `locked off`, `handheld, subtle`, `slow dolly in`, `slow orbit`, `whip pan`, `speed ramp`.
- Avoid conflicts ("static camera" + "orbit shot").
- For brand / MV / ad work, end the prompt with a negative line: *"no captions, watermarks, distortion, stretching."*
- For polish on a single-shot clip: see [`references/video-single-shot.md`](references/video-single-shot.md).

## Patterns

Pick the one that fits. Source lookup follows `PROJECT_AGENT.md`.

**Storyboard guard:** storyboard images route to Pattern 7 / `references/video-multi-shot.md`, never generic I2V/opening-frame wording.

### 1. Standalone T2V

**Triggers:** fresh clip unrelated to canvas content.
**Call:** `node "$PAI_REPO_ROOT/server/cli/generate_video.js" --prompt "..."`; omitted flags default to 15s, 16:9, 720p, audio on. Add `--resolution 480p` or `--resolution 1080p` only if the chosen video mode requires it.
**Edges:** none.
**For the bracket scaffold and slot-by-slot construction when the user wants polish:** see [`references/video-single-shot.md`](references/video-single-shot.md).

### 2. Animate a canvas image (I2V)

**Triggers:** animate/make video/put motion on a specific canvas `image_result`.
**Source:** named `image_result`; storyboard mosaics route through Storyboard guard.
**Call:** `node "$PAI_REPO_ROOT/server/cli/generate_video.js" --prompt "..." --ref-source-id <image.id>`.
**Edges:** `{ from: <source.id>, to: video_<N>, kind: "derived" }` — emi