25-voice-clone-podcast-global
# Voice Clone & Podcast, Audio AI for Personal Brand (Global) This Claude Code skill generates synthetic voices from text using AI platforms like ElevenLabs, Murf, and PlayHT to produce podcasts, audiobooks, and voiceovers in multiple English accents (US, UK, AU, SG). Use it to create long-form audio content (30-60 minute podcasts or full audiobooks) without on-camera recording, or to rapidly repurpose a single podcast episode into ten short-form video clips for social media distribution. Ideal when you need high-volume branded audio content quickly while maintaining consistent voice quality across formats.
git clone --depth 1 https://github.com/minhnv0807/ai-business-skills /tmp/25-voice-clone-podcast-global && cp -r /tmp/25-voice-clone-podcast-global/modules/personal-branding/en/25-voice-clone-podcast-global ~/.claude/skills/25-voice-clone-podcast-globalSKILL.md
# Voice Clone & Podcast — Audio AI for Personal Brand (Global) > **This skill focuses on audio AI** — voice clone, podcast, audiobook, voiceover. > Pairs with `24-ai-avatar-production-global` (video) — combine both for full content stack coverage. --- ## 1. Newbie Guide ### What is audio AI and how is it different from video AI? Audio AI is the tech behind synthetic voices that sound nearly human — from a sample of your voice, AI learns and produces a synthetic clone (voice clone). You write text -> AI reads it back (Text-to-Speech). **Differences vs video AI:** - Video AI (skill 24): produces video with face + voice -> talking head, social video - Audio AI (this skill): produces voice only -> podcast, audiobook, voiceover, narration ### When to use audio AI instead of video? | Situation | Pick audio AI | Pick video AI | |-----------|---------------|---------------| | Long-form content (>10 min) | YES — podcast format | NO — too long for video | | Don't want to be on camera | YES | NO | | Need volume content fast | YES — 1 podcast = 10 shorts | YES but more expensive | | Audience listens while driving / at gym | YES | NO | | Need visuals to demo | NO | YES | | Personal brand thought leader | YES — podcast = authority | YES — if face brand exists | ### Main tools (international) - **ElevenLabs:** Best in class for voice clone — top-tier English voices (US/UK/AU/IN), 30+ languages - **Murf:** 120+ voice library, strong for corporate voiceover, multilingual - **PlayHT:** API-friendly, instant clone, 800+ voices - **HeyGen Voice:** Bundles with HeyGen avatars — seamless voice + video pipeline - **Descript:** AI editing — cut audio by editing text, voice clone (Overdub) - **Resemble.ai:** Custom emotion control, brand-grade APIs - **Riverside:** Studio-quality podcast recording with AI Magic Clips repurpose ### Time and cost | Task | Time | Cost (USD/mo) | |------|------|---------------| | Voice clone setup | 30-60 min | $5-22 (ElevenLabs Starter/Pro) | | 60s voiceover (TikTok) | 5-10 min | $5-22 | | 30 min solo podcast | 1-2 hrs | $22-99 (ElevenLabs + Riverside) | | Audiobook chapter (15 min) | 30-45 min | $22-99 | | 1 podcast -> 10 clips | 1-2 hrs | $0-30 (Descript/Opus Clip) | ### 5 common mistakes 1. **AI voice sounds robotic:** sample too short or monotonic. Fix: re-record 3-5 minutes with varied emotions (happy, serious, sad). 2. **Mispronounced names/jargon:** TTS engines mishandle proper nouns. Fix: use phonetic spelling (e.g., "Anthropic" -> "an-THROW-pic") in the script. 3. **Audio clipping:** levels too hot. Fix: target -3dB peak, -16 LUFS loudness. 4. **Background noise/echo:** untreated room. Fix: small room with curtains and rugs, or apply NVIDIA Broadcast / Krisp / Adobe Enhance Speech. 5. **Boring podcast:** no editing, too many "ums". Fix: Descript auto-removes filler words, add light background music (-25dB). --- ## 2. Information collection Ask up to 4 questions before starting: 1. **Main use case?** Short voiceover (TikTok/Reels) / Podcast 30-60 min / Audiobook? 2. **Language(s)?** English (US/UK/AU/IN) / multilingual / single non-English? 3. **Total length?** <60s / 5-30 min / 30-60 min / >60 min (audiobook)? 4. **Budget tier?** Free ($0) / Starter ($5-22) / Pro ($22-99) / Business ($99+)? > Based on the answers, pick the appropriate use case + tool stack. --- ## 3. Voice clone setup ### Sample requirements | Criterion | Minimum | Optimal | |-----------|---------|---------| | Length | 1 min (Free tier) | 3-5 min (Pro tier) | | Room | Quiet, no echo | Acoustic treatment, rugs, curtains | | Mic | Phone + headset mic | Condenser mic (AT2020, $80-100) | | Distance | 20-30 cm | 15-20 cm with pop filter | | Format | MP3 128 kbps | WAV 44.1 kHz | | Content | One pre-written passage | Three passages: business / casual / emotional | > **Full reference:** `references/voice-clone-prompts-global.md` — sample scripts across English variants (US/UK/AU/SG/IN) and 3 topics (business / lifestyle / educational). ### Tool comparison (global) | Tool | English clone quality | Price/mo | Setup time | Best for | |------|----------------------|----------|------------|----------| | **ElevenLabs Pro** | Excellent (10/10) | $22 | 30 min | Multilingual, content creator | | **HeyGen Voice** | Good (8/10) | Bundled with avatar | 15 min | Combo with video AI | | **Murf** | Excellent (9/10) | $29-79 | 30 min | Corporate voiceover, e-learning | | **PlayHT** | Excellent (9.5/10) | $39-99 | 30 min | API-driven, instant clone | | **Descript Overdub** | Good (8/10) | $24 (Hobbyist) | 30 min | Podcast editing | | **Resemble.ai** | Excellent (9/10) | $30-99 | 1 hr | Brand custom voice, emotion control | **Recommendations:** - **English-only creator:** ElevenLabs Pro ($22) — best balance of quality and price - **Multilingual creator:** ElevenLabs Pro (30+ languages built in) - **Combo with video:** HeyGen (single platform — voice + avatar) - **Brand/agency at scale:** Resemble.ai or PlayHT (API + custom emotion) ### Consent form template ``` VOICE CLONE LICENSE AGREEMENT I, [Full name], ID/passport: [number], grant [Brand/Company]: 1. Permission to use samples of my voice to create an AI voice clone. 2. Use of the voice clone in [scope: internal / advertising / podcast / etc.]. 3. Term: from [DD/MM/YYYY] to [DD/MM/YYYY]. 4. Right of withdrawal: I may request deletion of the voice clone at any time in writing; the brand has 7 days to fully remove it. 5. Disclosure: the brand commits to disclose "AI-generated voice" wherever required by applicable law (FTC, EU AI Act, etc.). Signed: ____________ Date: ____________ ``` --- ## 4. Three use cases ### Use case A: Short voiceover for TikTok/Reels (Energetic) **Spec:** - Length: 15-60s - Pace: fast (180-220 wpm) — younger English-speaking audience - Tone: energetic, slightly higher pitch, exciting - Audio levels: -14 LUFS (TikTok), peak -1 dB - CTA: clear in the last 5 seconds **Script template (30s):** ``` [HOOK 0-3s] "Did
Agent van hanh kenh — thiet lap kenh, brief landing page, email marketing, social listening
Agent san xuat noi dung — viet script, copy, brief creator, lap lich noi dung
Agent chien luoc marketing — lap ke hoach, nghien cuu thi truong, phan tich doi thu, xay dung chien luoc thuong hieu
Agent phan tich hieu suat — doc data, danh gia chien dich, tinh KPI, bao cao
Agent xay dung thuong hieu ca nhan voi AI Avatar — chien luoc, content engine, monetization, community cho founder/coach/creator
Full dropshipping pipeline for US/EU/global markets — product research (winning criteria, Minea, PiPiAds), supplier sourcing (AliExpress, CJ Dropshipping, Spocket, Zendrop), Shopify store setup (themes, apps), ad creative pipeline (10 ads/week methodology, UGC pattern), audience targeting (interest stacking, lookalike, broad), pricing math (3-5x markup, BE-ROAS), customer service (long shipping, refunds), scaling playbook (CBO, vertical), compliance (FTC, EU CHRD). Trigger: 'dropshipping', 'shopify store', 'AliExpress', 'winning product', 'Facebook ads dropship', 'TikTok ads dropship', 'Shopify conversion'.
Foundation skill for global personal brand cluster. Creates `.agents/personal-brand-context-global.md` with region-specific personal brand context. 4 region variants (US/EU/SEA/LATAM); each covers founder/coach/creator inside. Reads BEFORE other PB skills (23-28 global). Trigger: 'global personal brand', 'international personal brand', 'US founder brand', 'EU coach brand', 'creator economy global'.