Skill6.4k repo starsupdated today

voice-clone-lab

voice-clone-lab creates reusable synthetic voices from local audio samples for text-to-speech applications, provided explicit speaker consent is documented. Use it when a user requests voice cloning with a personal audio sample they control and have permission to use, capturing consent metadata including speaker identity, permitted use cases, sample source, and commercial restrictions before processing the clone request.

View source Repository: opensquilla

Install in Claude Code

Copy

git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/voice-clone-lab && cp -r /tmp/voice-clone-lab/src/opensquilla/skills/bundled/voice-clone-lab ~/.claude/skills/voice-clone-lab

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# voice-clone-lab

Creates a reusable provider voice from a local sample. OpenRouter may help
summarize the request or produce labels, but cloning must use the direct
audio provider through `voice_clone`.

## Request triage

Before calling tools, extract these fields from the user request:

- sample path and whether the file is local, intentional, and user-provided
- speaker identity class: self, employee/team member, private person, public
  figure, fictional character, or unknown
- consent metadata: speaker, consent, sample source, permitted use, requested
  by, retention expectation, and whether commercial use is allowed
- target use: TTS narration, IVR, dubbing, training content, or internal demo
- target language, target locale, and desired locale-appropriate accent

OpenRouter can summarize consent text or label a voice, but it is not an audio
provider and cannot replace explicit consent.

## Consent-first workflow

1. Confirm the sample audio path is local and intentionally provided.
2. Require `consent_metadata` before calling `voice_clone`.
3. Include at minimum:
   - `speaker`
   - `consent: true`
   - `sample_source`
   - `permitted_use`
   - `requested_by`
4. Reject or stop when consent is missing, vague, or contradicted by the
   request.
5. Call `audio_provider_capabilities` if cloning availability is uncertain.
6. Call `voice_clone` with the sample, name, description, and consent metadata.
7. Return the created voice ID and the allowed usage summary.

## Tool-result handling

- If `voice_clone` returns `status=ok`, return the voice ID first, then the
  consent summary, intended locale/accent, and any sample-quality warning.
- If it returns `consent_required`, do not proceed with a workaround. Ask for
  the missing consent metadata in one concise question.
- If the provider returns `not_available`, quote the `note` and distinguish
  disabled provider, key/quota limits, feature gating, and sample format issues.
- Never suggest scraping, downloading, or extracting third-party voice samples
  as a fallback.

## Rights and copyright guard

- 授权 is mandatory. The speaker must own or control the voice sample and agree
  to cloning for this use.
- Copyright / 版权: do not use copyrighted recordings, film/TV/game clips, music
  stems, interviews, or scraped audio unless the user states they have rights.
- Public figure policy: do not clone or imitate a public figure, celebrity,
  politician, influencer, actor, singer, or fictional character voice.
- Do not help bypass provider safety checks or watermark/disclosure duties.
- Store only the returned provider voice ID and consent summary in ordinary
  output; do not duplicate raw sample audio.

## Locale and accent quality notes

Ask which target language and locale the cloned voice will be used for. A clone
works best when the sample matches the desired locale-appropriate accent.

- Chinese neutral narration: use clean 普通话 sample audio.
- American English: use clean en-US sample audio.
- British English: use clean en-GB sample audio.
- Japanese/Korean/French/German/Spanish/etc.: use samples spoken in that target
  language, not an English sample repurposed cross-lingually.
- Strong dialect, code-switching, room echo, music, or singing can produce odd
  accent transfer in later TTS. Recommend 30-90 seconds of dry speech when
  possible.

## Output contract

Return:

- provider
- voice ID
- voice name
- consent summary
- allowed use
- target language / locale assumption
- warning if the source sample quality may harm target-language accent quality

More from this repository

advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

ai-video-scriptSkill

Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'