Skill1.1k estrellas del repoactualizado 8d ago
blog-audio
Blog Audio generates professional audio narration of blog posts using Google's Gemini text-to-speech technology. It offers three modes: a 200-300 word summary, full article read-aloud, or two-speaker podcast dialogue, with 30 voice options across 80+ languages and HTML5 embed output. Use this skill when you need to create accessible audio versions of blog content or produce podcast-style episodes from written articles.
Instalar en Claude Code
Copiargit clone --depth 1 https://github.com/AgriciDaniel/claude-blog /tmp/blog-audio && cp -r /tmp/blog-audio/skills/blog-audio ~/.claude/skills/blog-audioDespués abre una sesión nueva de Claude Code; el skill carga automáticamente.
Definición
SKILL.md
# Blog Audio: Gemini TTS Narration for Blog Posts
Generate professional audio narration of blog content using Google's Gemini TTS.
Three modes: summary (200-300 word spoken overview), full article read-aloud,
or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
## Quick Reference
| Command | What it does |
|---------|-------------|
| `/blog audio generate <file>` | Generate audio narration of a blog post |
| `/blog audio voices` | Show available voices with characteristics |
| `/blog audio setup` | Check/configure API key for Gemini TTS |
## Prerequisites
- Python 3.11+ (venv managed automatically by `run.py`)
- `GOOGLE_AI_API_KEY` environment variable (same key used by blog-image)
- FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)
## Always Use run.py Wrapper
```bash
# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
# WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
```
## API Key Check (Gate Pattern)
Before generating audio, check for the API key:
```bash
echo $GOOGLE_AI_API_KEY
```
- If set: proceed with generation
- If not set: guide the user:
"Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey
Then set it: `export GOOGLE_AI_API_KEY=your-key`
This is the same key used by `/blog image`: if image generation works, audio works too."
- **When called internally** (from blog-write): return silently if key is missing.
Never block the writing workflow.
## Setup
For `/blog audio setup`:
1. Check if `GOOGLE_AI_API_KEY` is set in environment
2. If blog-image is configured (check `.mcp.json`), the key is already available
3. If not, guide user to https://aistudio.google.com/apikey
4. Verify with a dry run: `python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json`
## Voice Selection
For `/blog audio voices`:
Load `references/voices.md` and present the voice catalog to the user.
Ask the user which voice they prefer, or recommend based on content type:
- **Article narration**: Charon (Informative) or Sadaltager (Knowledgeable)
- **Tutorial/how-to**: Achird (Friendly) or Sulafat (Warm)
- **News/analysis**: Rasalgethi (Informative) or Schedar (Even)
- **Lifestyle/wellness**: Aoede (Breezy) or Vindemiatrix (Gentle)
- **Dialogue host**: Puck (Upbeat) or Laomedeia (Upbeat)
- **Dialogue expert**: Kore (Firm) or Charon (Informative)
## Generation Workflow
For `/blog audio generate <file>`:
### Step 1: Read the Blog Post
Read the file and extract:
- Title (from H1 or frontmatter)
- Full content (markdown body)
- Approximate word count
### Step 2: Choose Mode
Ask the user (or auto-select if they specified `--mode`):
| Mode | When to use | Output |
|------|-------------|--------|
| **Summary** | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| **Full** | Complete read-aloud (5-15 min) | Full article as natural speech |
| **Dialogue** | Podcast-style (3-8 min) | Two-person conversation about the article |
### Step 3: Prepare Text
**CRITICAL:** Claude prepares the text. The script does TTS only.
**Summary mode:**
Write a 200-300 word spoken summary of the article. Rules:
- Write as natural speech, not written text
- Open with the article's key finding or answer
- Cover 3-5 main takeaways
- Close with actionable advice
- No markdown, no "In this article...", no meta-commentary
- Use conversational transitions ("Here's what matters...", "The key finding is...")
**Full mode:**
Strip the markdown content to clean spoken text:
- Headings become natural transitions ("Next, let's look at...")
- Links become plain text (remove URLs, keep anchor text)
- Images and charts: omit or briefly describe ("As the data shows...")
- Code blocks: describe verbally ("The code uses a for-loop to...")
- Lists: convert to natural sentences
- Remove frontmatter, schema markup, HTML tags
- Add brief intro: "This is [title], published on [date]."
**Dialogue mode:**
Write a 2-person conversation script about the article:
- Speaker1 = Host (curious, asks good questions)
- Speaker2 = Expert (knowledgeable, gives clear answers)
- Format each line as: `[Speaker1] What's the key takeaway here?`
- Cover the article's main points conversationally
- 15-25 exchanges (produces ~3-8 minutes)
- Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")
### Step 4: Select Voice
If the user chose a voice, use it. Otherwise, recommend based on mode:
- Summary/Full: default to Charon (Informative)
- Dialogue: default to Puck (Host) + Kore (Expert)
### Step 5: Generate Audio
Write the prepared text to a temp file, then call:
```bash
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_prepared.txt \
--voice Charon \
--model flash \
--output /path/to/audio/post-slug.mp3 \
--json
# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_dialogue.txt \
--voice Puck \
--voice2 Kore \
--model pro \
--output /path/to/audio/post-slug-dialogue.mp3 \
--json
```
**Model selection:**
- `flash` (default): Fast, cheap. Good for summaries and standard narration.
- `pro`: Higher quality. Use for dialogue mode or premium content.
### Step 6: Deliver
Present the result to the user:
1. **File path**: where the audio was saved
2. **Duration**: human-readable (e.g., "3:42")
3. **Embed code**: ready-to-paste HTML5 audio tag
4. **Cost**: estimated API cost
5. **Placement suggestion**: where to insert the embed in the blog post
## Embedding Guide
### Standard HTML (Hugo, Jekyll, static sites)
```html
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
```
### MDX (Next.js, Gatsby)
```jsx
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
```
### Wo