Skill318 repo starsupdated 1mo ago
google-tts
This Claude Code skill converts text and documents into natural-sounding audio using Google Cloud's Text-to-Speech API, supporting multiple voice types and 40+ languages. Use it to generate narrations from markdown or PDF files, create multi-speaker podcasts from JSON scripts, or synthesize individual text passages into MP3 or other audio formats with customizable voice, speed, and pitch parameters.
Install in Claude Code
Copygit clone --depth 1 https://github.com/sanjay3290/ai-skills /tmp/google-tts && cp -r /tmp/google-tts/skills/google-tts ~/.claude/skills/google-ttsThen start a new Claude Code session; the skill loads automatically.
Definition
SKILL.md
# Google Cloud Text-to-Speech
Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.
## Setup
API key via `GOOGLE_TTS_API_KEY` env var or `skills/google-tts/config.json` with `{"api_key": "..."}`.
Requires `ffmpeg` for multi-chunk documents. Optional: `pip install PyPDF2 python-docx` for PDF/DOCX.
## Commands
### List Voices
```bash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json
```
### Text-to-Speech
```bash
# From text or document (PDF, DOCX, MD, TXT)
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
# With voice, rate, pitch, encoding options
python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
```
### Podcast Generation
Takes a JSON script with alternating speakers, synthesizes each with a different voice.
```json
[
{"speaker": "host1", "text": "Welcome to our podcast!"},
{"speaker": "host2", "text": "Thanks for having me..."}
]
```
```bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3
```
## Workflow
### Single-Voice Narration
1. If user provides a file path, use `--file`. For generated content, write clean prose to `/tmp/tts_input.md` first.
2. Default voice: `en-US-Neural2-D` (male) or `en-US-Neural2-F` (female). Use Neural2 for best quality/cost balance.
3. Generate: `python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3`
4. Report file location and size. Default output to `~/Downloads/`.
### Podcast from Document
1. Extract text: `python skills/google-tts/scripts/extract.py /path/to/document.pdf`
2. Generate a two-host conversation script as JSON:
- Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
- Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
3. Write script to `/tmp/podcast_script.json`
4. Generate: `python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3`
5. Clean up temp files.
## Reference
- **Recommended voice type**: Neural2 (~$4/1M chars, high quality)
- **Speaking rate**: 0.25-4.0 (0.85-0.95 good for technical content)
- **Pitch**: -20.0 to 20.0 semitones
- **Encodings**: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
- API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.More from this repository
atlassianSkill
|
azure-devopsSkill
|
deep-researchSkill
Execute autonomous multi-step research using Google Gemini Deep Research Agent. Use for: market analysis, competitive landscaping, literature reviews, technical research, due diligence. Takes 2-10 minutes but produces detailed, cited reports. Costs $2-5 per task.
elevenlabsSkill
|
gmailSkill
|
google-calendarSkill
|
google-chatSkill
|
google-docsSkill
|