Skill631 repo starsupdated 7d ago

Geek-skills-podcast-generator

This skill generates professional AI-powered podcasts from text input using Volcano Engine's Podcast AI Model. It creates dual-speaker conversational audio content in Chinese, supporting customizable voice options, audio formats (mp3, ogg_opus, pcm, aac), and speech rate adjustments. Use when transforming written content into engaging podcast audio or creating multi-speaker conversational content with natural flow.

View source Repository: ClaudeSkills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/staruhub/ClaudeSkills /tmp/geek-skills-podcast-generator && cp -r /tmp/geek-skills-podcast-generator/skills/Geek-skills-podcast-generator ~/.claude/skills/geek-skills-podcast-generator

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Podcast Generator

## Overview

Generate professional AI-powered podcasts using Volcano Engine's Podcast AI Model. This skill transforms text input into engaging dual-speaker podcast audio with natural conversation flow, supporting multiple audio formats and voice customization.

## Quick Start

To generate a podcast:

1. Ensure Volcano Engine credentials are available (APP_ID and ACCESS_KEY)
2. Prepare the podcast topic/content text (up to 25,000 characters)
3. Run the generation script with required parameters
4. Receive the output audio file in your preferred format

## Core Workflow

### Step 1: Prepare Input

**Required information:**
- Podcast topic or content text (Chinese, up to 25k characters)
- Volcano Engine APP ID
- Volcano Engine Access Key

**Optional customization:**
- Audio format (mp3, ogg_opus, pcm, aac)
- Sample rate (default: 24000 Hz)
- Speech rate (-50 to 100, where 100 = 2.0x speed)
- Speaker voices (default: male + female duo)
- Opening music (default: disabled)

### Step 2: Generate Podcast

Run the generation script:

```bash
python scripts/generate_podcast.py \
  --text "Your podcast topic or content" \
  --output "/path/to/output.mp3" \
  --app-id "YOUR_APP_ID" \
  --access-key "YOUR_ACCESS_KEY" \
  --format mp3 \
  --sample-rate 24000 \
  --speech-rate 0
```

**Alternative: Use as Python module**

```python
import asyncio
from scripts.generate_podcast import PodcastGenerator

async def create_podcast():
    generator = PodcastGenerator(
        app_id="YOUR_APP_ID",
        access_key="YOUR_ACCESS_KEY"
    )
    
    result = await generator.generate_podcast(
        input_text="分析下当前的大模型发展",
        output_path="podcast.mp3",
        audio_format="mp3",
        sample_rate=24000,
        speech_rate=0,
        use_head_music=False
    )
    
    if result['success']:
        print(f"✅ Podcast generated: {result['output_path']}")
    else:
        print(f"❌ Failed: {result['error']}")

asyncio.run(create_podcast())
```

### Step 3: Handle Output

The script will:
- Stream audio data in real-time
- Display progress for each speaking round
- Save the complete audio file to the specified path
- Return generation statistics (file size, round count, etc.)

## Advanced Features

### Resume from Interruption

If generation is interrupted, use the resume capability:

```python
result = await generator.generate_podcast(
    input_text="Your topic",
    output_path="podcast.mp3",
    retry_info={
        "retry_task_id": "previous_task_id",
        "last_finished_round_id": 5
    }
)
```

The system will continue from the last completed round instead of starting over.

### Custom Speaker Configuration

Specify different speaker voices:

```python
result = await generator.generate_podcast(
    input_text="Your topic",
    output_path="podcast.mp3",
    speakers=[
        "zh_male_dayixiansheng_v2_saturn_bigtts",
        "zh_female_mizaitongxue_v2_saturn_bigtts"
    ]
)
```

### Audio Format Options

Supported formats and use cases:

- **mp3**: Best for general distribution (compressed, widely supported)
- **ogg_opus**: High quality with good compression
- **pcm**: Uncompressed raw audio (largest file size, highest quality)
- **aac**: Modern compressed format with good quality

### Speech Rate Adjustment

Control speaking speed:

- `speech_rate=0`: Normal speed (1.0x)
- `speech_rate=100`: 2x speed (fast)
- `speech_rate=-50`: 0.5x speed (slow)

## Common Usage Patterns

### Pattern 1: Quick Blog Post to Podcast

```python
blog_text = """
[Your blog post content here - can be long form]
"""

result = await generator.generate_podcast(
    input_text=blog_text,
    output_path="blog_podcast.mp3"
)
```

### Pattern 2: Research Paper Summary

```python
paper_summary = "Summarize the key findings of the latest AI research..."

result = await generator.generate_podcast(
    input_text=paper_summary,
    output_path="research_podcast.mp3",
    use_head_music=True  # Add opening music for professional touch
)
```

### Pattern 3: Educational Content

```python
lesson_topic = "Explain quantum computing concepts for beginners"

result = await generator.generate_podcast(
    input_text=lesson_topic,
    output_path="lesson.mp3",
    speech_rate=-20  # Slightly slower for educational content
)
```

## Error Handling

Common issues and solutions:

**Connection Errors:**
- Verify APP_ID and ACCESS_KEY are correct
- Check network connectivity
- Ensure firewall allows WebSocket connections

**Text Too Long:**
- The model truncates at 25,000 characters
- Split long content into multiple podcasts

**Audio Not Generated:**
- Check output path is writable
- Verify sufficient disk space
- Review error messages for specific issues

**Incomplete Generation:**
- Use retry_info to resume from last completed round
- Check logs for the task_id and last_finished_round_id

## Resource Usage

### scripts/generate_podcast.py

Complete WebSocket client implementation for Volcano Engine's Podcast API:
- Handles binary protocol communication
- Manages streaming audio reception
- Implements automatic retry logic
- Provides both CLI and programmatic interfaces

**Key features:**
- Async/await pattern for efficient I/O
- Progress tracking with emoji indicators
- Comprehensive error handling
- Flexible parameter configuration

### references/api_reference.md

Detailed API documentation including:
- Complete parameter specifications
- WebSocket protocol details
- Event type reference
- Error code explanations

Consult this file for:
- Advanced API usage
- Protocol-level debugging
- Custom implementation needs

## Requirements

**Python dependencies:**
```bash
pip install websockets
```

**Credentials:**
- Volcano Engine APP ID (obtain from console: https://console.volcengine.com/speech/service/10028)
- Volcano Engine Access Key

## Best Practices

1. **Input Text Quality**: Use clear, well-structured Chinese text for best results
2. **Length Optimization**: Aim for 500-3000 characters for optimal podcast leng

More from this repository

llm-wikiSkill

Build and maintain a structured LLM-generated wiki for any codebase. Use when the user asks to analyze/understand/document a codebase, build a code wiki, create project documentation from source, or update an existing .llm-wiki. Triggers on phrases like "build wiki", "analyze this codebase", "document this project", "update wiki", "llm-wiki", or when entering an unfamiliar project that has no .llm-wiki yet.

Geek-skills-a-share-analystSkill

A股专业分析师助手，提供每日股价分析、选股策略和投资建议。适用于：(1) 获取A股实时行情和历史数据，(2) 技术面分析（K线形态、MACD、KDJ、RSI、布林带等），(3) 基本面分析（财务指标、估值分析），(4) 板块热点追踪，(5) 选股策略筛选，(6) 量化因子分析，(7) 生成每日股市分析报告。当用户询问"帮我分析股票"、"今日选股"、"A股行情分析"、"技术分析"、"基本面分析"、"量化选股"等相关问题时触发。

Geek-skills-ai-sales-championSkill

AI咨询/销售的对话策略助手。当用户需要准备AI方案沟通、跟业务部门聊AI落地、写AI提案、应对客户异议、做AI培训破冰时使用。触发场景："怎么跟老板聊AI"、"客户说AI不靠谱"、"准备一个AI方案汇报"、"帮我想想怎么推AI"、"业务部门不配合"、"AI项目怎么卖"、"demo之后怎么跟进"。也适用于AI咨询师、技术合伙人、CTO做内部AI推广。

Geek-skills-c-drive-cleanerSkill

Windows C盘清理和磁盘空间管理工具。当用户需要清理C盘、释放磁盘空间、查找大文件、分析磁盘占用、删除临时文件、清理缓存、管理Windows系统垃圾文件时使用此skill。适用于以下场景:(1)C盘空间不足需要清理;(2)查找和删除大文件;(3)分析磁盘空间占用;(4)清理系统临时文件和缓存;(5)清理浏览器缓存;(6)清理回收站;(7)清理系统日志;(8)优化Windows磁盘空间。

deep-researchSkill

Geek-skills-gaokao-expertSkill

资深高考命题专家助手,提供专业的命题指导和评审服务。适用于创作高考试题、评审试题质量、分析试卷结构、了解命题趋势等场景。结合文档工具提取解压文件,使用网络搜索了解最新命题趋势,使用分析工具评估题目质量和试卷结构。涵盖"一核四层四翼"评价体系、2025年命题趋势、题型规范、评分标准、命题流程等多个维度,符合高考命题最佳实践。

Geek-skills-keqian-methodSkill

胥克谦式AI-Native产品开发方法论。适用于：(1) 使用AI Agent（Claude Code、Codex、Cursor等）进行产品级软件开发，(2) 设计和优化Harness/Skill体系，(3) 文档驱动开发(SDD)流程，(4) 构建自动化质量门禁和eval机制，(5) Token成本优化与缓存策略，(6) 产品人转型开发者的AI编程实践。触发场景包括"帮我设计开发流程"、"怎么降低token成本"、"怎么提高AI编码质量"、"文档驱动"、"质量门禁"、"harness设计"、"单agent vs multi-agent"、"自动化迭代"、"AI产品开发"、"SDD"、"eval机制"等。即使用户只是说"帮我用AI写代码"或"怎么让agent干活更靠谱"也应触发。

Geek-skills-mineru-pdf-parserSkill

PDF解析工具，将复杂PDF文档转换为LLM友好的Markdown/JSON格式。适用于：(1) 将PDF转换为Markdown或JSON格式，(2) 提取PDF中的文本、表格、公式、图像，(3) 处理学术论文、技术文档、商业报告的PDF解析，(4) 为RAG应用准备高质量文档数据，(5) 批量处理PDF文件。触发关键词包括："PDF解析"、"PDF转Markdown"、"PDF转JSON"、"提取PDF表格"、"提取PDF公式"、"MinerU"、"文档解析"、"PDF extraction"、"convert PDF"、"parse PDF"等。