Geek-skills-podcast-generator
This skill generates professional AI-powered podcasts from text input using Volcano Engine's Podcast AI Model. It creates dual-speaker conversational audio content in Chinese, supporting customizable voice options, audio formats (mp3, ogg_opus, pcm, aac), and speech rate adjustments. Use when transforming written content into engaging podcast audio or creating multi-speaker conversational content with natural flow.
git clone --depth 1 https://github.com/staruhub/ClaudeSkills /tmp/geek-skills-podcast-generator && cp -r /tmp/geek-skills-podcast-generator/skills/Geek-skills-podcast-generator ~/.claude/skills/geek-skills-podcast-generatorSKILL.md
# Podcast Generator
## Overview
Generate professional AI-powered podcasts using Volcano Engine's Podcast AI Model. This skill transforms text input into engaging dual-speaker podcast audio with natural conversation flow, supporting multiple audio formats and voice customization.
## Quick Start
To generate a podcast:
1. Ensure Volcano Engine credentials are available (APP_ID and ACCESS_KEY)
2. Prepare the podcast topic/content text (up to 25,000 characters)
3. Run the generation script with required parameters
4. Receive the output audio file in your preferred format
## Core Workflow
### Step 1: Prepare Input
**Required information:**
- Podcast topic or content text (Chinese, up to 25k characters)
- Volcano Engine APP ID
- Volcano Engine Access Key
**Optional customization:**
- Audio format (mp3, ogg_opus, pcm, aac)
- Sample rate (default: 24000 Hz)
- Speech rate (-50 to 100, where 100 = 2.0x speed)
- Speaker voices (default: male + female duo)
- Opening music (default: disabled)
### Step 2: Generate Podcast
Run the generation script:
```bash
python scripts/generate_podcast.py \
--text "Your podcast topic or content" \
--output "/path/to/output.mp3" \
--app-id "YOUR_APP_ID" \
--access-key "YOUR_ACCESS_KEY" \
--format mp3 \
--sample-rate 24000 \
--speech-rate 0
```
**Alternative: Use as Python module**
```python
import asyncio
from scripts.generate_podcast import PodcastGenerator
async def create_podcast():
generator = PodcastGenerator(
app_id="YOUR_APP_ID",
access_key="YOUR_ACCESS_KEY"
)
result = await generator.generate_podcast(
input_text="分析下当前的大模型发展",
output_path="podcast.mp3",
audio_format="mp3",
sample_rate=24000,
speech_rate=0,
use_head_music=False
)
if result['success']:
print(f"✅ Podcast generated: {result['output_path']}")
else:
print(f"❌ Failed: {result['error']}")
asyncio.run(create_podcast())
```
### Step 3: Handle Output
The script will:
- Stream audio data in real-time
- Display progress for each speaking round
- Save the complete audio file to the specified path
- Return generation statistics (file size, round count, etc.)
## Advanced Features
### Resume from Interruption
If generation is interrupted, use the resume capability:
```python
result = await generator.generate_podcast(
input_text="Your topic",
output_path="podcast.mp3",
retry_info={
"retry_task_id": "previous_task_id",
"last_finished_round_id": 5
}
)
```
The system will continue from the last completed round instead of starting over.
### Custom Speaker Configuration
Specify different speaker voices:
```python
result = await generator.generate_podcast(
input_text="Your topic",
output_path="podcast.mp3",
speakers=[
"zh_male_dayixiansheng_v2_saturn_bigtts",
"zh_female_mizaitongxue_v2_saturn_bigtts"
]
)
```
### Audio Format Options
Supported formats and use cases:
- **mp3**: Best for general distribution (compressed, widely supported)
- **ogg_opus**: High quality with good compression
- **pcm**: Uncompressed raw audio (largest file size, highest quality)
- **aac**: Modern compressed format with good quality
### Speech Rate Adjustment
Control speaking speed:
- `speech_rate=0`: Normal speed (1.0x)
- `speech_rate=100`: 2x speed (fast)
- `speech_rate=-50`: 0.5x speed (slow)
## Common Usage Patterns
### Pattern 1: Quick Blog Post to Podcast
```python
blog_text = """
[Your blog post content here - can be long form]
"""
result = await generator.generate_podcast(
input_text=blog_text,
output_path="blog_podcast.mp3"
)
```
### Pattern 2: Research Paper Summary
```python
paper_summary = "Summarize the key findings of the latest AI research..."
result = await generator.generate_podcast(
input_text=paper_summary,
output_path="research_podcast.mp3",
use_head_music=True # Add opening music for professional touch
)
```
### Pattern 3: Educational Content
```python
lesson_topic = "Explain quantum computing concepts for beginners"
result = await generator.generate_podcast(
input_text=lesson_topic,
output_path="lesson.mp3",
speech_rate=-20 # Slightly slower for educational content
)
```
## Error Handling
Common issues and solutions:
**Connection Errors:**
- Verify APP_ID and ACCESS_KEY are correct
- Check network connectivity
- Ensure firewall allows WebSocket connections
**Text Too Long:**
- The model truncates at 25,000 characters
- Split long content into multiple podcasts
**Audio Not Generated:**
- Check output path is writable
- Verify sufficient disk space
- Review error messages for specific issues
**Incomplete Generation:**
- Use retry_info to resume from last completed round
- Check logs for the task_id and last_finished_round_id
## Resource Usage
### scripts/generate_podcast.py
Complete WebSocket client implementation for Volcano Engine's Podcast API:
- Handles binary protocol communication
- Manages streaming audio reception
- Implements automatic retry logic
- Provides both CLI and programmatic interfaces
**Key features:**
- Async/await pattern for efficient I/O
- Progress tracking with emoji indicators
- Comprehensive error handling
- Flexible parameter configuration
### references/api_reference.md
Detailed API documentation including:
- Complete parameter specifications
- WebSocket protocol details
- Event type reference
- Error code explanations
Consult this file for:
- Advanced API usage
- Protocol-level debugging
- Custom implementation needs
## Requirements
**Python dependencies:**
```bash
pip install websockets
```
**Credentials:**
- Volcano Engine APP ID (obtain from console: https://console.volcengine.com/speech/service/10028)
- Volcano Engine Access Key
## Best Practices
1. **Input Text Quality**: Use clear, well-structured Chinese text for best results
2. **Length Optimization**: Aim for 500-3000 characters for optimal podcast lengBuild and maintain a structured LLM-generated wiki for any codebase. Use when the user asks to analyze/understand/document a codebase, build a code wiki, create project documentation from source, or update an existing .llm-wiki. Triggers on phrases like "build wiki", "analyze this codebase", "document this project", "update wiki", "llm-wiki", or when entering an unfamiliar project that has no .llm-wiki yet.
A股专业分析师助手,提供每日股价分析、选股策略和投资建议。适用于:(1) 获取A股实时行情和历史数据,(2) 技术面分析(K线形态、MACD、KDJ、RSI、布林带等),(3) 基本面分析(财务指标、估值分析),(4) 板块热点追踪,(5) 选股策略筛选,(6) 量化因子分析,(7) 生成每日股市分析报告。当用户询问"帮我分析股票"、"今日选股"、"A股行情分析"、"技术分析"、"基本面分析"、"量化选股"等相关问题时触发。
AI咨询/销售的对话策略助手。当用户需要准备AI方案沟通、跟业务部门聊AI落地、写AI提案、应对客户异议、做AI培训破冰时使用。触发场景:"怎么跟老板聊AI"、"客户说AI不靠谱"、"准备一个AI方案汇报"、"帮我想想怎么推AI"、"业务部门不配合"、"AI项目怎么卖"、"demo之后怎么跟进"。也适用于AI咨询师、技术合伙人、CTO做内部AI推广。
Windows C盘清理和磁盘空间管理工具。当用户需要清理C盘、释放磁盘空间、查找大文件、分析磁盘占用、删除临时文件、清理缓存、管理Windows系统垃圾文件时使用此skill。适用于以下场景:(1)C盘空间不足需要清理;(2)查找和删除大文件;(3)分析磁盘空间占用;(4)清理系统临时文件和缓存;(5)清理浏览器缓存;(6)清理回收站;(7)清理系统日志;(8)优化Windows磁盘空间。
>
资深高考命题专家助手,提供专业的命题指导和评审服务。适用于创作高考试题、评审试题质量、分析试卷结构、了解命题趋势等场景。结合文档工具提取解压文件,使用网络搜索了解最新命题趋势,使用分析工具评估题目质量和试卷结构。涵盖"一核四层四翼"评价体系、2025年命题趋势、题型规范、评分标准、命题流程等多个维度,符合高考命题最佳实践。
胥克谦式AI-Native产品开发方法论。适用于:(1) 使用AI Agent(Claude Code、Codex、Cursor等)进行产品级软件开发,(2) 设计和优化Harness/Skill体系,(3) 文档驱动开发(SDD)流程,(4) 构建自动化质量门禁和eval机制,(5) Token成本优化与缓存策略,(6) 产品人转型开发者的AI编程实践。触发场景包括"帮我设计开发流程"、"怎么降低token成本"、"怎么提高AI编码质量"、"文档驱动"、"质量门禁"、"harness设计"、"单agent vs multi-agent"、"自动化迭代"、"AI产品开发"、"SDD"、"eval机制"等。即使用户只是说"帮我用AI写代码"或"怎么让agent干活更靠谱"也应触发。
PDF解析工具,将复杂PDF文档转换为LLM友好的Markdown/JSON格式。适用于:(1) 将PDF转换为Markdown或JSON格式,(2) 提取PDF中的文本、表格、公式、图像,(3) 处理学术论文、技术文档、商业报告的PDF解析,(4) 为RAG应用准备高质量文档数据,(5) 批量处理PDF文件。触发关键词包括:"PDF解析"、"PDF转Markdown"、"PDF转JSON"、"提取PDF表格"、"提取PDF公式"、"MinerU"、"文档解析"、"PDF extraction"、"convert PDF"、"parse PDF"等。