Skip to main content
ClaudeWave
Skill1.1k estrellas del repoactualizado 8d ago

blog-audio

Blog Audio generates professional audio narration of blog posts using Google's Gemini text-to-speech technology. It offers three modes: a 200-300 word summary, full article read-aloud, or two-speaker podcast dialogue, with 30 voice options across 80+ languages and HTML5 embed output. Use this skill when you need to create accessible audio versions of blog content or produce podcast-style episodes from written articles.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/AgriciDaniel/claude-blog /tmp/blog-audio && cp -r /tmp/blog-audio/skills/blog-audio ~/.claude/skills/blog-audio
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Blog Audio: Gemini TTS Narration for Blog Posts

Generate professional audio narration of blog content using Google's Gemini TTS.
Three modes: summary (200-300 word spoken overview), full article read-aloud,
or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.

## Quick Reference

| Command | What it does |
|---------|-------------|
| `/blog audio generate <file>` | Generate audio narration of a blog post |
| `/blog audio voices` | Show available voices with characteristics |
| `/blog audio setup` | Check/configure API key for Gemini TTS |

## Prerequisites

- Python 3.11+ (venv managed automatically by `run.py`)
- `GOOGLE_AI_API_KEY` environment variable (same key used by blog-image)
- FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

## Always Use run.py Wrapper

```bash
# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

# WRONG:
python3 scripts/generate_audio.py --text "..."  # Fails without venv
```

## API Key Check (Gate Pattern)

Before generating audio, check for the API key:

```bash
echo $GOOGLE_AI_API_KEY
```

- If set: proceed with generation
- If not set: guide the user:
  "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey
   Then set it: `export GOOGLE_AI_API_KEY=your-key`
   This is the same key used by `/blog image`: if image generation works, audio works too."
- **When called internally** (from blog-write): return silently if key is missing.
  Never block the writing workflow.

## Setup

For `/blog audio setup`:

1. Check if `GOOGLE_AI_API_KEY` is set in environment
2. If blog-image is configured (check `.mcp.json`), the key is already available
3. If not, guide user to https://aistudio.google.com/apikey
4. Verify with a dry run: `python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json`

## Voice Selection

For `/blog audio voices`:

Load `references/voices.md` and present the voice catalog to the user.

Ask the user which voice they prefer, or recommend based on content type:
- **Article narration**: Charon (Informative) or Sadaltager (Knowledgeable)
- **Tutorial/how-to**: Achird (Friendly) or Sulafat (Warm)
- **News/analysis**: Rasalgethi (Informative) or Schedar (Even)
- **Lifestyle/wellness**: Aoede (Breezy) or Vindemiatrix (Gentle)
- **Dialogue host**: Puck (Upbeat) or Laomedeia (Upbeat)
- **Dialogue expert**: Kore (Firm) or Charon (Informative)

## Generation Workflow

For `/blog audio generate <file>`:

### Step 1: Read the Blog Post

Read the file and extract:
- Title (from H1 or frontmatter)
- Full content (markdown body)
- Approximate word count

### Step 2: Choose Mode

Ask the user (or auto-select if they specified `--mode`):

| Mode | When to use | Output |
|------|-------------|--------|
| **Summary** | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| **Full** | Complete read-aloud (5-15 min) | Full article as natural speech |
| **Dialogue** | Podcast-style (3-8 min) | Two-person conversation about the article |

### Step 3: Prepare Text

**CRITICAL:** Claude prepares the text. The script does TTS only.

**Summary mode:**
Write a 200-300 word spoken summary of the article. Rules:
- Write as natural speech, not written text
- Open with the article's key finding or answer
- Cover 3-5 main takeaways
- Close with actionable advice
- No markdown, no "In this article...", no meta-commentary
- Use conversational transitions ("Here's what matters...", "The key finding is...")

**Full mode:**
Strip the markdown content to clean spoken text:
- Headings become natural transitions ("Next, let's look at...")
- Links become plain text (remove URLs, keep anchor text)
- Images and charts: omit or briefly describe ("As the data shows...")
- Code blocks: describe verbally ("The code uses a for-loop to...")
- Lists: convert to natural sentences
- Remove frontmatter, schema markup, HTML tags
- Add brief intro: "This is [title], published on [date]."

**Dialogue mode:**
Write a 2-person conversation script about the article:
- Speaker1 = Host (curious, asks good questions)
- Speaker2 = Expert (knowledgeable, gives clear answers)
- Format each line as: `[Speaker1] What's the key takeaway here?`
- Cover the article's main points conversationally
- 15-25 exchanges (produces ~3-8 minutes)
- Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")

### Step 4: Select Voice

If the user chose a voice, use it. Otherwise, recommend based on mode:
- Summary/Full: default to Charon (Informative)
- Dialogue: default to Puck (Host) + Kore (Expert)

### Step 5: Generate Audio

Write the prepared text to a temp file, then call:

```bash
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_prepared.txt \
  --voice Charon \
  --model flash \
  --output /path/to/audio/post-slug.mp3 \
  --json

# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_dialogue.txt \
  --voice Puck \
  --voice2 Kore \
  --model pro \
  --output /path/to/audio/post-slug-dialogue.mp3 \
  --json
```

**Model selection:**
- `flash` (default): Fast, cheap. Good for summaries and standard narration.
- `pro`: Higher quality. Use for dialogue mode or premium content.

### Step 6: Deliver

Present the result to the user:
1. **File path**: where the audio was saved
2. **Duration**: human-readable (e.g., "3:42")
3. **Embed code**: ready-to-paste HTML5 audio tag
4. **Cost**: estimated API cost
5. **Placement suggestion**: where to insert the embed in the blog post

## Embedding Guide

### Standard HTML (Hugo, Jekyll, static sites)
```html
<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>
```

### MDX (Next.js, Gatsby)
```jsx
<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
```

### Wo