Skill356 repo starsupdated 18d ago

elevenlabs

This ElevenLabs skill converts text and documents into high-quality audio through two modes: single-voice narration for documents and two-host conversational podcast generation. Use it when users request text-to-speech conversion, document narration, or podcast creation from written content, with automatic text chunking, voice continuity, and audio concatenation handled through the ElevenLabs API.

View source Repository: ai-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/sanjay3290/ai-skills /tmp/elevenlabs && cp -r /tmp/elevenlabs/skills/elevenlabs ~/.claude/skills/elevenlabs

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# ElevenLabs - Text-to-Speech & Podcast Skill

## Overview

This skill converts text and documents into high-quality audio using ElevenLabs TTS API. It supports two modes: single-voice narration and two-host conversational podcast generation.

## When to Use This Skill

Activate when the user mentions:
- "create podcast", "generate podcast", "podcast from document"
- "narrate document", "narrate this file", "read aloud"
- "text to speech", "TTS", "convert to audio"
- "audio from document", "audio version of"

## Setup

Config at `skills/elevenlabs/config.json`:
```json
{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}
```

Only `api_key` is required. Or set `ELEVENLABS_API_KEY` env var.

Dependencies: `pip install PyPDF2 python-docx` (only needed for PDF/DOCX files).

Requires `ffmpeg` for multi-chunk narration and podcasts.

## Commands

### List Voices

```bash
python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json
```

Use this to find voice IDs for the user.

### Single-Voice TTS

```bash
# From text
python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3

# From document
python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

# With specific voice
python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3
```

The script handles text extraction, chunking at sentence boundaries (~4000 chars), TTS per chunk with voice continuity, and ffmpeg concatenation automatically.

### Podcast Generation

Podcast mode requires a JSON script file with conversation segments:

```json
[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]
```

```bash
python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3
```

## Podcast Workflow (for Claude)

When the user asks to create a podcast from a document:

1. **Extract the document text**:
   ```bash
   python skills/elevenlabs/scripts/extract.py /path/to/document.pdf
   ```

2. **Generate a two-host conversation script** from the extracted text. Follow these guidelines:
   - Write as a natural, engaging discussion between two hosts
   - Host 1 typically leads/introduces topics, Host 2 adds analysis and reactions
   - Start with a brief intro welcoming listeners and stating the topic
   - End with a summary/outro
   - Keep each turn under 3000 characters
   - Vary turn lengths - mix short reactions with longer explanations
   - Use conversational language: "That's a great point", "What I found interesting was..."
   - Reference specific details from the source document
   - Avoid reading the document verbatim - discuss and interpret it

3. **Write the script** as a JSON array to a temp file:
   ```python
   # Write to /tmp/podcast_script.json
   [
     {"speaker": "host1", "text": "Welcome to today's episode..."},
     {"speaker": "host2", "text": "Thanks for having me..."},
     ...
   ]
   ```

4. **Generate the podcast**:
   ```bash
   python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
   ```

5. **Clean up** the temp script file.

## Tips

- Run `voices` first to let the user pick voices they like
- For podcasts, suggest voice pairs with contrasting qualities (e.g., one deep, one bright)
- Default output to `~/Downloads/` unless the user specifies otherwise
- For large documents, warn the user about character usage on their ElevenLabs plan