seedance-2-prompt
Seedance-2-prompt generates short AI video clips (3-15 seconds) from structured English prompts using Seedance 2.0, supporting OpenRouter or ByteDance's ARK endpoints. Use it when users request AI video generation with optional first-frame or reference images for character consistency across shots.
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/seedance-2-prompt && cp -r /tmp/seedance-2-prompt/src/opensquilla/skills/bundled/seedance-2-prompt ~/.claude/skills/seedance-2-promptSKILL.md
# seedance-2-prompt — Seedance 2.0 video clip generator (dual backend)
Submits a Seedance 2.0 generation job and downloads the resulting MP4.
Two backends share one CLI, picked via `with.provider`:
| `with.provider` | Endpoint | Auth env | Default model |
|---|---|---|---|
| `openrouter` (default) | `https://openrouter.ai/api/v1/videos` | `OPENROUTER_API_KEY` | `bytedance/seedance-2.0` |
| `volcengine` (CN) | `https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks` | `ARK_API_KEY` (or `VOLC_ARK_API_KEY`) | `doubao-seedance-2-0-260128` |
| `byteplus` (intl) | `https://ark.ap-southeast.bytepluses.com/api/v3/contents/generations/tasks` | `ARK_API_KEY` (or `BYTEPLUS_API_KEY`) | `dreamina-seedance-2-0-260128` |
Both flavours follow submit-then-poll, but differ in request shape
(OpenRouter uses a flat `prompt` field; ARK packs everything into a
`content[]` array), polling URL (OpenRouter returns `polling_url`; ARK
gets `id` and you construct `/contents/generations/tasks/{id}`), the
terminal-success status (`completed` vs `succeeded`), and where the
final MP4 URL sits (top-level `unsigned_urls[0]` vs `content.video_url`).
This script normalises both into a single Python contract.
## Inputs (`with:`)
| key | required | default | notes |
|---|---|---|---|
| `prompt` | yes | — | Structured English video prompt (use this skill's recipes). |
| `filename` | yes | — | Output `.mp4` path. |
| `provider` | no | `openrouter` | `openrouter`, `volcengine`, or `byteplus`. |
| `aspect_ratio` | no | `9:16` | `9:16`, `16:9`, `1:1`, `4:3`, `3:4`, `21:9`. |
| `duration` | no | `5` | Seconds, 3-15. Provider enforces 4-15 typically. |
| `resolution` | no | `720p` | `480p`, `720p`, `1080p`. Ignored by OpenRouter. |
| `model` | no | provider default | Override model id. Empty means use provider default. |
| `input_image` | no | `""` | Strict first-frame path. If set, video starts from this image. |
| `input_reference` | no | `""` | Primary soft identity/style anchor path. Used only when `input_image` is empty. Same anchor passed across shots locks the character. |
| `input_reference_2` | no | `""` | Optional second reference (e.g. per-shot scene composition). Forwarded as a second `--input-reference` so the underlying provider sees both. Empty strings are filtered out before the API call. |
| `max_retries` | no | `0` | Extra retries on transient submit/poll/download failures or non-success terminal status. `0` = single attempt; `2` = up to 3 total attempts with exponential backoff (2s, 4s, 8s capped at 15s). Set this on flows that fall back to a still-image animator on final failure. |
**`input_image` vs `input_reference`** — `input_image` becomes the literal
first frame. `input_reference` is a softer style + identity hint the
model uses without locking the frame. For multi-shot drama, pass the
same `input_reference` to every shot; pass `input_image` only when you
want a specific opening frame.
## Prompt rules (from upstream + OpenSquilla tightening)
1. **One major action per 3-5s segment.** Don't pack multiple motions.
2. **Identity continuity** — repeat the main character's full
description in every shot's prompt.
3. **Specific over poetic** — `"a young woman in a red trench coat
walks through rain-soaked neon streets"` >> `"a woman walking"`.
4. **Negative constraints inline** — append `no watermark, no logo,
no subtitles, no on-screen text.`
5. **IP-safe** — invent original character/brand names.
6. **Aspect ratio explicit** — append `aspect_ratio: 9:16`.
See `references/recipes.md`, `references/modes-and-recipes.md`,
`references/camera-and-styles.md` for the upstream playbook.
## Auth
- `openrouter` provider API-key resolution order:
1. `--api-key` CLI argument
2. `OPENROUTER_API_KEY` env var
3. `OPENSQUILLA_LLM_API_KEY` env var, only when the effective
OpenSquilla LLM provider resolves to `openrouter`
4. `llm.api_key` or `llm.api_key_env` from the selected OpenSquilla
TOML config. Config discovery matches `GatewayConfig.load`:
explicit `OPENSQUILLA_GATEWAY_CONFIG_PATH` first; otherwise
`./opensquilla.toml`, then `default_opensquilla_home()/config.toml`.
`OPENSQUILLA_STATE_DIR` changes `default_opensquilla_home()`, so a
state-dir profile does not fall through to `~/.opensquilla`.
Config-file credentials are consumed only when the selected config's
`llm.provider` is `openrouter` or omitted.
- `volcengine` / `byteplus` provider reads `ARK_API_KEY` (with provider-
specific fallbacks `VOLC_ARK_API_KEY` / `BYTEPLUS_API_KEY`). No
config-file fallback for these — the OpenSquilla `[llm]` config
describes the agent's selected LLM provider, not ARK / BytePlus video
credentials.
- All three send the key as `Authorization: Bearer <key>`.
- For OpenRouter the same bearer is also added when downloading the
resulting `unsigned_urls[0]`. Volcengine returns pre-signed object
store URLs that reject extra headers, so the downloader strips
Authorization for non-OpenRouter hosts.
## Output
Prints the absolute path of the saved `.mp4` on stdout. Non-zero
exit on any error; stderr carries the diagnostic.
## Cost / latency
- OpenRouter `bytedance/seedance-2.0` 5s @ 9:16 720p: ≈30-90s wall, ≈$0.76.
- Volcengine official 5s 1080p: ≈30-120s wall, ≈$0.93.
- Volcengine `doubao-seedance-2-0-fast-260128`: roughly half cost and faster.
- The returned `unsigned_urls` / `content.video_url` expire 24 hours
after success on the Volcengine path — this script downloads them
before that window so the local mp4 is durable.
## Multi-segment workflows (>15s)
Generate segments individually with `duration ≤ 15`, ending each on a
stable hand-off frame. Stitch with the `video-merger` skill. See
`references/modes-and-recipes.md` § "Multi-Segment Stitching".Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.
Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.
Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.
Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.
Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.
GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.
Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'