doubao-tts
The doubao-tts skill generates high-quality speech audio from text using Volcengine's Doubao TTS API, supporting both real-time synthesis for short text under 300 characters and asynchronous processing for long-form content up to 100,000 characters. Use this skill when users request audio generation, text-to-speech conversion, podcast creation, voiceovers, or narration of any written content.
git clone --depth 1 https://github.com/xvirobotics/metabot /tmp/doubao-tts && cp -r /tmp/doubao-tts/.claude/skills/doubao-tts ~/.claude/skills/doubao-ttsSKILL.md
# Doubao TTS — 豆包语音合成
Generate high-quality speech audio from text using Volcengine's Doubao TTS API. Supports short-form (real-time) and long-form (async, up to 100K characters) synthesis.
## When to Use
- User asks to generate audio, podcasts, voiceovers, or narration
- User wants text-to-speech for any content
- User asks to "read this aloud" or "make an audio version"
## Quick Usage
Use the `doubao-tts` CLI tool (installed at `bin/doubao-tts`):
```bash
# Short text (real-time, < 300 chars)
bin/doubao-tts "你好世界" -o output.mp3
# Long text from file (async mode, up to 100K chars)
bin/doubao-tts -f article.txt -o podcast.mp3
# Pipe content
echo "Hello world" | bin/doubao-tts -o hello.mp3
# Choose voice
bin/doubao-tts "你好" -v zh_male_aojiaobazong_moon_bigtts -o output.mp3
# Adjust speed/volume/pitch
bin/doubao-tts "你好" --speed 1.2 --volume 1.5 -o output.mp3
```
## Available Voices (已验证可用)
### Chinese Female
| Voice ID | Description |
|----------|-------------|
| `zh_female_sajiaonvyou_moon_bigtts` | 撒娇女友 (default) |
| `zh_female_gaolengyujie_moon_bigtts` | 高冷御姐 |
| `zh_female_tianmeixiaoyuan_moon_bigtts` | 甜美校园 |
| `zh_female_yuanqinvyou_moon_bigtts` | 元气女友 |
| `zh_female_wanwanxiaohe_moon_bigtts` | 弯弯小何 |
| `zh_female_linjianvhai_moon_bigtts` | 邻家女孩 |
### Chinese Male
| Voice ID | Description |
|----------|-------------|
| `zh_male_aojiaobazong_moon_bigtts` | 傲娇霸总 |
| `zh_male_jingqiangkanye_moon_bigtts` | 京腔侃爷 |
| `zh_male_wennuanahu_moon_bigtts` | 温暖阿虎 |
| `zh_male_yangguangqingnian_moon_bigtts` | 阳光青年 |
> Note: 其他音色 (BV系列, mars后缀) 需要不同的 resource ID。如需更多音色,请在火山引擎控制台开通对应资源。
## API Details
### Environment Variables (already configured in MetaBot .env)
```
VOLCENGINE_TTS_APPID=<app_id>
VOLCENGINE_TTS_ACCESS_KEY=<access_key>
VOLCENGINE_TTS_RESOURCE_ID=volc.service_type.10029 (optional)
```
### Short-form API (real-time, < 300 chars)
- Endpoint: `https://openspeech.bytedance.com/api/v3/tts/unidirectional`
- Response: chunked JSON with base64 audio in `data` field
- Latency: < 1 second
### Long-form API (async, up to 100K chars)
- Submit: `POST https://openspeech.bytedance.com/api/v1/tts_async/submit`
- Query: `GET https://openspeech.bytedance.com/api/v1/tts_async/query?appid=X&task_id=Y`
- Response: `audio_url` (valid for 1 hour)
- Latency: seconds to minutes depending on text length
## Workflow for Podcasts
1. **Write the script** — Create the podcast script as markdown or plain text
2. **Generate audio** — Use `bin/doubao-tts -f script.txt -v zh_male_aojiaobazong_moon_bigtts -o podcast.mp3`
3. **Copy to outputs** — `cp podcast.mp3 /tmp/metabot-outputs/<chatId>/` to send to user
4. For multi-voice podcasts, generate each speaker's segments separately, then concatenate with `ffmpeg`
## Multi-Voice Podcast Example
```bash
# Generate segments for different speakers
bin/doubao-tts -f host_lines.txt -v zh_male_aojiaobazong_moon_bigtts -o host.mp3
bin/doubao-tts -f guest_lines.txt -v zh_female_gaolengyujie_moon_bigtts -o guest.mp3
# Concatenate (requires ffmpeg)
echo "file 'host.mp3'" > list.txt
echo "file 'guest.mp3'" >> list.txt
ffmpeg -f concat -safe 0 -i list.txt -c copy podcast.mp3
```
## Raw curl (if CLI not available)
```bash
# Short-form
curl -X POST "https://openspeech.bytedance.com/api/v3/tts/unidirectional" \
-H "Content-Type: application/json" \
-H "X-Api-App-Id: $VOLCENGINE_TTS_APPID" \
-H "X-Api-Access-Key: $VOLCENGINE_TTS_ACCESS_KEY" \
-H "X-Api-Resource-Id: volc.service_type.10029" \
-H "X-Api-Request-Id: $(uuidgen)" \
-d '{
"req_params": {
"text": "你好世界",
"speaker": "zh_female_sajiaonvyou_moon_bigtts",
"audio_params": {"format": "mp3", "sample_rate": 24000}
}
}' | python3 -c "
import sys, json, base64
chunks = []
for line in sys.stdin:
line = line.strip()
if not line: continue
try:
d = json.loads(line)
if d.get('data'): chunks.append(base64.b64decode(d['data']))
except: pass
sys.stdout.buffer.write(b''.join(chunks))
" > output.mp3
```Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Read and write shared memory documents. Use this when you need to save knowledge, notes, research findings, or project context for future reference across sessions. Also use it to look up previously stored information.
Talk to other MetaBot bots (`mb talk` — send a message to another bot, including cross-instance peers). Use when you want to delegate to or message another bot, e.g. 'talk to bot X', '跟其他 bot 说话', 'send message to peer bot', 'ask the deploy-bot', 'delegate to bot'. Also covers bot/peer management, skill hub, voice calls.
MetaBot's persistent server-side scheduler (cron + one-shot). Optional skill — not installed by default. Use when the user wants tasks that survive Claude session restarts, are visible to other bots, or need to run in MetaBot's PM2 process rather than this Claude session.
The meta-skill: create AI agent teams, individual agents, or custom skills for any project. Use when the user wants to generate a complete agent team, create a single agent, or create a single skill for Claude Code, Kimi, or Codex.
Discover, search, and install shared skills from the Skill Hub registry. Use when the user wants to find available skills, share a skill with other bots, or install a skill from the hub.