Skill318 repo starsupdated 4d ago

voice-compose

voice-compose designs and generates character voice samples and narration audio for filmmaking projects by calling the local generate_voice.js CLI with specific text and voice descriptions. Use this skill when creating initial character voice anchors for video dialogue, previewing how a character sounds, or generating final narration and line reads that will be used downstream in video composition.

View source Repository: pai-pro

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Utopai-Research/pai-pro /tmp/voice-compose && cp -r /tmp/voice-compose/skills/voice-compose ~/.claude/skills/voice-compose

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

Default to one short reusable timbre sample per speaking character and one VO/narrator sample when narration exists. `video-compose` keeps actual shot dialogue/VO in the video prompt. Treat `audio_result.data.text` as downstream speech only for approved final narration/line reads.

## Patterns

Follow `PROJECT_AGENT.md` for context/staging. This skill owns voice prompt + CLI shape.

### 1. Character voice sample

Triggers: "give / design a voice for [character]", "what does [character] sound like", "voices for all the characters on the canvas".

- Target: any `image_result` for the person; don't gate on subtype. Read `data.local_path` before prompt; layer `name`/`role`/`description` on top.
- Call:
  ```
  node "$PAI_REPO_ROOT/server/cli/generate_voice.js" \
    --text "<line>" \
    --prompt "<voice design brief>" \
    --source-node-id <character.id>
  ```
- Prompt describes the **voice**, not the character:
  > `[age bracket] [gender], [timbre], [register], [pace], [accent if relevant]. [optional emotional color].`

  ✅ "Mid-50s man, gravelly baritone, measured pace, slight rasp from decades of smoking, weary but steady."
  ✅ "Young woman, bright mezzo, warm, quick and percussive. Slight Southern lilt."
  ❌ "Detective Morris's voice." — names the character, not the voice. The model needs sound qualities.
- `text`: 1-3 sentence in-character sample (≤200 chars), not every script line.
- Script breakdowns: one staged call per speaker; preserve labels. Add separate VO/narrator via Pattern 2.

### 2. Narrator / VO voice sample or final line audio

Triggers: narrator voice, voice-over, "a voice that says X" without character, narration track, or explicit final line-read audio.

- Omit `--source-node-id`:
  ```
  node "$PAI_REPO_ROOT/server/cli/generate_voice.js" \
    --text "<the narration line>" \
    --prompt "<voice design brief>"
  ```
- Same prompt convention as Pattern 1.
- Reusable VO/narrator anchor: short sample line in narrator style, not full script narration.
- Final narration/line-read: copy approved text exactly into `--text`; then `data.text` is source of truth.

More from this repository

groups-composeSkill

Designs and maintains semantic groupings and readable layouts on the filmmaking canvas — scenes, character-reference sets, act beats, and other titled visual frames. Use when nodes on the canvas cluster around a shared meaning and would read more clearly if arranged together and wrapped in a frame. Don't force it — groups are a view concern, not an organizing tax.

image-composeSkill

script-composeSkill

story-to-video-workflowSkill

video-composeSkill

Generates and prompts video clips on the filmmaking canvas. Use when the user asks to generate, render, animate, continue, restyle, edit, shoot, or compose a video clip; render script or shot notes as video; animate a storyboard, starting frame, image, character, location, or reference; use image, video, audio, storyboard, starting-frame, or voice refs; compose an ad, brand film, product promo, music-video shot, or video sequence; or before calling generate_video.js. Owns video CLI flags, refs, prompt construction, audio-ref handling, and video-specific failure hints.