voice-clone-lab
voice-clone-lab creates reusable synthetic voices from local audio samples for text-to-speech applications, provided explicit speaker consent is documented. Use it when a user requests voice cloning with a personal audio sample they control and have permission to use, capturing consent metadata including speaker identity, permitted use cases, sample source, and commercial restrictions before processing the clone request.
git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/voice-clone-lab && cp -r /tmp/voice-clone-lab/src/opensquilla/skills/bundled/voice-clone-lab ~/.claude/skills/voice-clone-labSKILL.md
# voice-clone-lab Creates a reusable provider voice from a local sample. OpenRouter may help summarize the request or produce labels, but cloning must use the direct audio provider through `voice_clone`. ## Request triage Before calling tools, extract these fields from the user request: - sample path and whether the file is local, intentional, and user-provided - speaker identity class: self, employee/team member, private person, public figure, fictional character, or unknown - consent metadata: speaker, consent, sample source, permitted use, requested by, retention expectation, and whether commercial use is allowed - target use: TTS narration, IVR, dubbing, training content, or internal demo - target language, target locale, and desired locale-appropriate accent OpenRouter can summarize consent text or label a voice, but it is not an audio provider and cannot replace explicit consent. ## Consent-first workflow 1. Confirm the sample audio path is local and intentionally provided. 2. Require `consent_metadata` before calling `voice_clone`. 3. Include at minimum: - `speaker` - `consent: true` - `sample_source` - `permitted_use` - `requested_by` 4. Reject or stop when consent is missing, vague, or contradicted by the request. 5. Call `audio_provider_capabilities` if cloning availability is uncertain. 6. Call `voice_clone` with the sample, name, description, and consent metadata. 7. Return the created voice ID and the allowed usage summary. ## Tool-result handling - If `voice_clone` returns `status=ok`, return the voice ID first, then the consent summary, intended locale/accent, and any sample-quality warning. - If it returns `consent_required`, do not proceed with a workaround. Ask for the missing consent metadata in one concise question. - If the provider returns `not_available`, quote the `note` and distinguish disabled provider, key/quota limits, feature gating, and sample format issues. - Never suggest scraping, downloading, or extracting third-party voice samples as a fallback. ## Rights and copyright guard - 授权 is mandatory. The speaker must own or control the voice sample and agree to cloning for this use. - Copyright / 版权: do not use copyrighted recordings, film/TV/game clips, music stems, interviews, or scraped audio unless the user states they have rights. - Public figure policy: do not clone or imitate a public figure, celebrity, politician, influencer, actor, singer, or fictional character voice. - Do not help bypass provider safety checks or watermark/disclosure duties. - Store only the returned provider voice ID and consent summary in ordinary output; do not duplicate raw sample audio. ## Locale and accent quality notes Ask which target language and locale the cloned voice will be used for. A clone works best when the sample matches the desired locale-appropriate accent. - Chinese neutral narration: use clean 普通话 sample audio. - American English: use clean en-US sample audio. - British English: use clean en-GB sample audio. - Japanese/Korean/French/German/Spanish/etc.: use samples spoken in that target language, not an English sample repurposed cross-lingually. - Strong dialect, code-switching, room echo, music, or singing can produce odd accent transfer in later TTS. Recommend 30-90 seconds of dry speech when possible. ## Output contract Return: - provider - voice ID - voice name - consent summary - allowed use - target language / locale assumption - warning if the source sample quality may harm target-language accent quality
Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.
Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.
Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.
Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.
Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.
Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.
GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.
Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'