voice-conversion-studio
voice-conversion-studio transforms a local audio file into a different voice using an authorized audio provider. Use it when users request voice conversion, voice changing, or narration replacement, provided the source recording has speaker consent and copyright clearance, and the target voice is either provider-licensed or properly authorized.
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/voice-conversion-studio && cp -r /tmp/voice-conversion-studio/src/opensquilla/skills/bundled/voice-conversion-studio ~/.claude/skills/voice-conversion-studioSKILL.md
# voice-conversion-studio Converts an existing local recording into a target voice using the configured audio provider. OpenRouter can assist with planning or file naming, but the conversion itself must use `voice_convert`. ## Request triage Before calling tools, extract these fields from the user request: - source audio path and whether it is local, intentional, and user-provided - source rights: speaker consent and recording copyright - target voice: provider-licensed voice, cloned voice ID, or user-provided voice ID - target language, target locale, desired accent, emotion, pace, and output format - output expectation: quick conversion sample, final asset, or multiple takes OpenRouter can help summarize or translate instructions, but it is not an audio provider and cannot authorize voice identity use. ## Required workflow 1. Check the source file is local and intentionally provided. 2. Confirm rights for both sides: - source recording copyright and speaker authorization - target voice consent or provider-licensed voice 3. Refuse public figure or copyrighted character imitation. 4. Use `audio_provider_capabilities` if conversion availability is uncertain. 5. Call `voice_convert` with `source_audio`, `voice`, optional `output_path`, and any supported provider controls. 6. Return the result as a playable audio artifact when the surface supports it. ## Preview-first When source quality, accent transfer, or target voice fit is uncertain, convert a short sample before processing a full recording. Recommend re-recording or cleaning the source if the preview contains room echo, background music, strong dialect mismatch, or heavy code-switching. For multilingual conversion, avoid using a target voice that does not naturally support the target language. A short preview is the fastest way to catch odd accent transfer before spending quota on the whole asset. ## Tool-result handling - If `voice_convert` returns `status=ok`, return the playable artifact/path first, then target voice, mime type, and rights summary. - If it returns `consent_required`, ask for source and target consent metadata instead of attempting a different voice identity. - If it returns `not_available`, quote the `note` and distinguish provider setup, feature gating, key/quota limits, file format, and source path issues. ## Rights and copyright guard - 授权 is required for the source speaker and target voice. - Copyright / 版权: do not convert songs, movie lines, podcasts, audiobooks, lectures, interviews, or game/animation dialogue unless the user says they have rights. - Public figure policy: do not convert a recording to sound like a public figure, celebrity, actor, singer, politician, influencer, or fictional character. - If the user asks for a risky identity target, offer a non-identifying target: "mature calm Mandarin narrator", "bright young commercial voice", etc. ## Locale and accent quality notes For voice conversion, first identify the target language and locale. The source recording and target voice should be compatible with the desired locale-appropriate accent. - Chinese neutral narration: prefer clean 普通话 source and target voice. - English: preserve requested locale such as en-US, en-GB, en-AU, en-IN, or en-SG. - Japanese/Korean/French/German/Spanish/etc.: prefer source/target voices that naturally support that language. - Strong dialect, background music, reverberation, and heavy code-switching can cause odd accent transfer. Recommend re-recording a short, dry sample before converting a whole script. ## Output contract Return: - provider - target voice - output path - mime type - playable audio artifact status - rights/consent summary - target language / locale assumption
Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.
Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.
Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.
Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.
Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.
GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.
Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'