higgsfield-audio
>
git clone --depth 1 https://github.com/OSideMedia/higgsfield-ai-prompt-skill /tmp/higgsfield-audio && cp -r /tmp/higgsfield-audio/skills/higgsfield-audio ~/.claude/skills/higgsfield-audioSKILL.md
# Higgsfield Audio Prompting Guide ## Which Models Support Audio? | Model | Audio type | Dialogue | SFX | Ambient | BGM | Lip-sync | |-------|-----------|----------|-----|---------|-----|----------| | Kling 3.0 / Omni | Native joint | ✅ | ✅ | ✅ | ✅ | ✅ Multi-language | | Seedance 2.0 | Native joint | ✅ | ✅ | ✅ | ✅ | ✅ Multi-language | | Seedance 1.5 Pro | Native joint | ✅ | ✅ | ✅ | ✅ | ✅ Best lip-sync | | Veo 3 / 3.1 | Native joint | ✅ | ✅ | ✅ | ✅ | ✅ English best | | Grok Imagine Video | Native joint | ✅ | ✅ | ✅ | ✅ | ✅ | | All other models | ❌ | — | — | — | — | — | **"Native joint"** means audio and video are generated simultaneously in one pass — not layered on after. This produces natural synchronization without post-production. Models without native audio: add audio in post with Lipsync Studio or external tools. --- ## The Four Audio Layers Every audio-capable prompt should consider four layers. You don't need all four in every prompt, but knowing which to include gives the model clear direction. ### 1. Dialogue — What characters say Put dialogue in quotes. Be explicit about who speaks, their tone, and language. ``` She says: "We need to leave. Now." He whispers: "Not yet." ``` **Best practices:** - Keep dialogue short — 1-2 sentences per character per shot - Specify emotional tone: "says urgently", "whispers", "shouts across the room" - For non-English: specify language and dialect → `She speaks in Cantonese: "走啦"` - For Seedance 1.5 Pro: supports English, Chinese (incl. Sichuanese, Cantonese, Taiwanese Mandarin, Shanghainese), Japanese, Korean, Spanish, Indonesian ### 2. SFX — Specific sound events tied to action Describe SFX at the point they happen. Tie them to visible actions. ``` The glass shatters on the floor — sharp crack, then settling tinkle. Footsteps on wet concrete — splashing, rhythmic. A door slams shut — heavy metal, echoing. ``` **Best practices:** - One SFX description per action beat - Use onomatopoeia sparingly — descriptive phrases work better than "BANG" or "CRASH" - Tie timing to action: "as she sets the cup down" not "cup sound at 4 seconds" ### 3. Ambient — Background soundscape Set the acoustic environment. This is the continuous sound bed. ``` Ambient: quiet café murmur, espresso machine, rain against windows. Ambient: forest at night — crickets, distant owl, gentle wind through leaves. Ambient: busy intersection — traffic, horns, construction in the distance. ``` **Best practices:** - 2-3 ambient elements maximum — more gets muddy - Describe the *space* acoustics: "reverberant church hall", "tight car interior" - Contrast silence with sound for impact: "Dead silence. Then — a single footstep." ### 4. BGM — Background music mood Don't name songs or artists (content filter). Describe the musical texture. ``` BGM: slow piano, minor key, melancholic. BGM: tense orchestral build — low strings, rising. BGM: lo-fi hip-hop beat, warm vinyl crackle, relaxed. ``` **Best practices:** - Describe instrumentation, tempo, mood — not genre labels alone - "Tense strings, building" works better than "suspenseful music" - Specify when music enters/exits: "Piano enters at the midpoint, builds to the end" - For beat-sync content: "Cuts match the downbeat" or "Movement peaks on the drop" --- ## Audio Prompt Structure Add audio cues naturally within your prompt or as a dedicated block at the end. ### Inline method (preferred for short prompts): ``` A woman walks into a quiet library. Her heels click on the marble floor — each step echoing. She whispers to the librarian: "Do you have the Collected Letters?" Distant page turns. A clock ticks somewhere above. ``` ### Dedicated block method (better for complex audio): ``` [Scene description — visual content, action, camera] Audio: Dialogue: She says "We leave at dawn." He replies: "I'll be ready." SFX: coffee cup set down, chair scraping back Ambient: early morning kitchen — birds outside, kettle just boiled BGM: none — silence emphasizes the tension ``` --- ## Lip-Sync Rules Lip-sync is the most failure-prone audio feature. Follow these rules strictly: ### Do: - Keep dialogue clips 3–8 seconds (sweet spot for accuracy) - Use medium close-up or closer framing — model needs to see the mouth clearly - One speaking face per shot — multiple faces break audio routing - Lock the camera: `locked-off static camera` or `slow Dolly In` only - Remove all head/face motion tokens: `nodding`, `turning head`, `looking around` compete with the lip engine and cause desync ### Don't: - Don't combine dialogue with vigorous head movement in the same prompt - Don't use 15s clips for lip-sync — technical max but accuracy degrades past 8s - Don't include ambient or music tokens if lip-sync is the priority — they invite the generative audio engine to override your dialogue - Don't use non-MP3 audio for Seedance 2.0 (when available) — WAV/AAC/OGG fail silently ### Multi-character dialogue workaround: Multi-person lip-sync matching is an unresolved limitation across all models. The production workaround: 1. Generate each character separately with their own audio segment 2. Composite in CapCut/Premiere using picture-in-picture + linear mask (15% feather) 3. Static image for the listening character; generated video for the speaking character --- ## Audio by Model — What Works Best Where ### Kling 3.0 (V3) / 3.0 Omni (O3) - Best overall audio-visual integration - Multi-language dialogue (English, Chinese, Japanese, Korean, Spanish + regional accents: American English, British English, Indian English) - Multi-character dialogue: 3+ characters with correct speaker attribution and lip-sync per character - Voice Binding: lock specific voice profiles to specific characters across shots - O3 adds Voice Extraction from static images: upload audio clip (min 3s) + image to build a voice profile - O3 adds Performance Cloning: act out a scene on camera → AI re-renders preserving likeness and voice - Include dialogue
Guided version bump — validate, tag, and create GitHub release
Run pre-release validation checks on all SKILL.md files and JSON databases
>
Seedance 2.0 video prompt director. Converts plain-text scene descriptions into production-ready bilingual EN+ZH video prompts optimized for the Seedance 2.0 video generator. Handles action scenes (combat, pursuit, stunts), general scenes (landscapes, journeys, atmosphere), and dialogue scenes (confrontations, negotiations, interrogations). Use this skill whenever the user wants to create a Seedance video prompt, describes a scene for video generation, mentions Seedance, or asks for a cinematic scene breakdown.
>
>
>
Use when the user mentions Higgsfield Canvas, a node-based or node graph workspace, an infinite board/canvas, chaining generations into a pipeline, or wants to wire prompts → images → videos across models on one surface. Covers what Canvas is, the node categories, the seven models that run inside Canvas, the named canvas patterns (Simple Seedance, Extend Video, Image Edit, StoryBoard With Elements, Long Video fan-out), the build-free / generate-paid cost model, reusable templates, assets-as-nodes, and Shared Canvas live collaboration. Also trigger on 'Higgsfield ComfyUI alternative', 'node workflow', or 'connect nodes to build a scene/campaign'.