Skill6.4k repo starsupdated today

voice-conversion-studio

voice-conversion-studio transforms a local audio file into a different voice using an authorized audio provider. Use it when users request voice conversion, voice changing, or narration replacement, provided the source recording has speaker consent and copyright clearance, and the target voice is either provider-licensed or properly authorized.

View source Repository: opensquilla

Install in Claude Code

Copy

git clone --depth 1 https://github.com/opensquilla/opensquilla /tmp/voice-conversion-studio && cp -r /tmp/voice-conversion-studio/src/opensquilla/skills/bundled/voice-conversion-studio ~/.claude/skills/voice-conversion-studio

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# voice-conversion-studio

Converts an existing local recording into a target voice using the configured
audio provider. OpenRouter can assist with planning or file naming, but the
conversion itself must use `voice_convert`.

## Request triage

Before calling tools, extract these fields from the user request:

- source audio path and whether it is local, intentional, and user-provided
- source rights: speaker consent and recording copyright
- target voice: provider-licensed voice, cloned voice ID, or user-provided
  voice ID
- target language, target locale, desired accent, emotion, pace, and output
  format
- output expectation: quick conversion sample, final asset, or multiple takes

OpenRouter can help summarize or translate instructions, but it is not an
audio provider and cannot authorize voice identity use.

## Required workflow

1. Check the source file is local and intentionally provided.
2. Confirm rights for both sides:
   - source recording copyright and speaker authorization
   - target voice consent or provider-licensed voice
3. Refuse public figure or copyrighted character imitation.
4. Use `audio_provider_capabilities` if conversion availability is uncertain.
5. Call `voice_convert` with `source_audio`, `voice`, optional `output_path`,
   and any supported provider controls.
6. Return the result as a playable audio artifact when the surface supports it.

## Preview-first

When source quality, accent transfer, or target voice fit is uncertain, convert
a short sample before processing a full recording. Recommend re-recording or
cleaning the source if the preview contains room echo, background music, strong
dialect mismatch, or heavy code-switching.

For multilingual conversion, avoid using a target voice that does not naturally
support the target language. A short preview is the fastest way to catch odd
accent transfer before spending quota on the whole asset.

## Tool-result handling

- If `voice_convert` returns `status=ok`, return the playable artifact/path
  first, then target voice, mime type, and rights summary.
- If it returns `consent_required`, ask for source and target consent metadata
  instead of attempting a different voice identity.
- If it returns `not_available`, quote the `note` and distinguish provider
  setup, feature gating, key/quota limits, file format, and source path issues.

## Rights and copyright guard

- 授权 is required for the source speaker and target voice.
- Copyright / 版权: do not convert songs, movie lines, podcasts, audiobooks, lectures,
  interviews, or game/animation dialogue unless the user says they have rights.
- Public figure policy: do not convert a recording to sound like a public
  figure, celebrity, actor, singer, politician, influencer, or fictional
  character.
- If the user asks for a risky identity target, offer a non-identifying target:
  "mature calm Mandarin narrator", "bright young commercial voice", etc.

## Locale and accent quality notes

For voice conversion, first identify the target language and locale. The source
recording and target voice should be compatible with the desired
locale-appropriate accent.

- Chinese neutral narration: prefer clean 普通话 source and target voice.
- English: preserve requested locale such as en-US, en-GB, en-AU, en-IN, or
  en-SG.
- Japanese/Korean/French/German/Spanish/etc.: prefer source/target voices that
  naturally support that language.
- Strong dialect, background music, reverberation, and heavy code-switching can
  cause odd accent transfer. Recommend re-recording a short, dry sample before
  converting a whole script.

## Output contract

Return:

- provider
- target voice
- output path
- mime type
- playable audio artifact status
- rights/consent summary
- target language / locale assumption

More from this repository

advanced-dubbing-studioSkill

Submit audio or video for multilingual dubbing, poll status, and download dubbed audio. Use when the user asks for dubbing, 多语言配音, 视频翻译配音, 译制片, or wants a source clip dubbed into another language.

ai-video-scriptSkill

Generate a structured short-video shooting script from a topic. Emits a strict, machine-parseable shot list (3 shots by default) with image prompt + video prompt + voiceover + on-screen text per shot. Trigger when the user asks for a video script, 分镜, 短视频文案, AI视频, 短剧脚本, or wants visual prompts ready for image/video generation.

cronSkill

Use when the user asks to schedule recurring tasks, one-off reminders, timers, or cron-style jobs through the OpenSquilla cron tool.

deep-researchSkill

Multi-round research with explicit methodology, evidence tracking, and citation-tagged synthesis. Trigger on 'deep dive', 'research report', 'literature review', 'investigate X across sources', 'multi-round investigation'. Distinct from the `summarize` skill, which is a single-pass condensation; this skill maintains a state file across iterations, tracks coverage, and produces a long-form report with per-claim citations. Three execution stages: plan (scope into sub-questions), iterate (record evidence per round), compile (synthesize report). The skill itself does not fetch the web — it tells the host agent which fetches to perform via OpenSquilla's existing web tools, and records what comes back.

docxSkill

Read, edit, or create Microsoft Word `.docx` files. Trigger this skill whenever the user mentions a Word document, .docx file, contract, report, brief, memo, or asks to extract text, modify an existing doc, generate one from a brief, or audit tracked changes. Three execution paths: text-and-structure extraction, in-place edit-by-run (preserves styles), and create-from-scratch with python-docx. Falls back to OOXML unzip-and-patch for layout work python-docx cannot reach.

git-diffSkill

Capture the current git diff (staged, working-tree, or staged file list) as text. Direct shell call for workflows that need repository diffs without an LLM agent loop.

githubSkill

GitHub operations via `gh` CLI: issues, PRs, CI runs, code review, API queries. Use when: (1) checking PR status or CI, (2) creating/commenting on issues, (3) listing/filtering PRs or issues, (4) viewing run logs. NOT for: complex web UI interactions requiring manual browser flows (use browser tooling when available), bulk operations across many repos (script with gh api), or when gh auth is not configured.

history-explorerSkill

Query the per-turn DecisionEntry log for skill co-occurrence patterns, meta-skill usage stats, and the router fixture corpus. Returns a JSON summary suitable for downstream LLM consumption. Used by meta-skill-creator's harvest step but also useful standalone for 'which skills did I use most this week?'