Skill122 repo starsupdated 7d ago

venice-audio-speech

The venice-audio-speech skill converts text to speech via a POST endpoint supporting nine TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash) with configurable voices, output formats (mp3, opus, aac, flac, wav, pcm), streaming capabilities, and optional style/emotion parameters. Use this skill when building narration features, voice replies, UI audio generation, or when specific voice families and emotional styling are needed.

View source Repository: skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/veniceai/skills /tmp/venice-audio-speech && cp -r /tmp/venice-audio-speech/skills/venice-audio-speech ~/.claude/skills/venice-audio-speech

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Venice TTS (`/audio/speech`)

`POST /api/v1/audio/speech` converts text to an audio stream or file. OpenAI-compatible — the OpenAI SDK's `audio.speech.create()` works as a drop-in.

## Use when

- You want narration, voice replies, or UI audio from text.
- You need a specific voice family (ElevenLabs, Kokoro, xAI, Qwen 3, Orpheus, Chatterbox, MiniMax, Inworld, Gemini Flash).
- You want streaming audio returned sentence-by-sentence.
- You need style/emotion control on supported models.

For music generation (lyrics + instrumental), see [`venice-audio-music`](../venice-audio-music/SKILL.md). For transcription (audio → text), see [`venice-audio-transcription`](../venice-audio-transcription/SKILL.md).

## Minimal request

```bash
curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-xai-v1",
    "voice": "eve",
    "input": "Hello, welcome to Venice Voice.",
    "response_format": "mp3",
    "speed": 1.0,
    "streaming": false
  }' --output hello.mp3
```

Response is the raw audio (`Content-Type` matches `response_format`).

## Request schema

| Field | Type | Default | Notes |
|---|---|---|---|
| `input` | string | — | **Required.** Up to **4096** characters. |
| `model` | enum | `tts-kokoro` (OpenAPI schema default) | See model list below. `tts-xai-v1` is the recommended frontier default; pick the model that fits your voice + language needs. |
| `voice` | enum | model-specific (e.g. `eve` for `tts-xai-v1`) | **Voice is model-specific** — wrong combo = `400`. See voice families. |
| `response_format` | `mp3` / `opus` / `aac` / `flac` / `wav` / `pcm` | `mp3` | `pcm` returns 24 kHz signed-16 LE for pipelines. |
| `speed` | number | `1.0` | Range `0.25–4.0`. |
| `streaming` | bool | `false` | `true` → streamed sentence-by-sentence as audio continues to generate. |
| `language` | string | — | Optional hint. Accepted form depends on model (Qwen 3 = full names like `English`; xAI / ElevenLabs = ISO 639-1 like `en`; MiniMax = full names). Unsupported values silently ignored. |
| `prompt` | string, ≤ 500 | — | Emotion / style cue. Only for models with `supportsPromptParam` (Qwen 3 currently). Examples: *"Very happy."*, *"Sad and slow."*. |
| `temperature` | 0–2 | — | Sampling temperature. Only for models with `supportsTemperatureParam` (Qwen 3, Orpheus, Chatterbox HD). |
| `top_p` | 0–1 | — | Only Qwen 3 currently. |

## Models

| Model ID | Family | Highlights |
|---|---|---|
| `tts-xai-v1` | xAI | **Recommended default.** Conversational style, ISO 639-1 language hints. |
| `tts-kokoro` | Kokoro | OpenAPI schema default. Multilingual, many voices across languages. |
| `tts-qwen3-0-6b` / `tts-qwen3-1-7b` | Qwen 3 | Emotion control via `prompt`, temperature, top_p. |
| `tts-inworld-1-5-max` | Inworld | Character-driven voices (Craig, Ashley, …). |
| `tts-chatterbox-hd` | Chatterbox | HD voices (Aurora, Blade, …), temperature. |
| `tts-orpheus` | Orpheus | Conversational (tara, leah, jess, leo, …), temperature. |
| `tts-elevenlabs-turbo-v2-5` | ElevenLabs Turbo | Rachel, Aria, Charlotte, Roger, … |
| `tts-minimax-speech-02-hd` | MiniMax | WiseWoman, DeepVoiceMan, … |
| `tts-gemini-3-1-flash` | Gemini Flash | Star-named voices (Achernar, Achird, Zephyr, …). |

Always inspect the entry for your model in `GET /models?type=tts` — `model_spec.voices` is the authoritative voice list. Per-model toggles like `supportsPromptParam`, `supportsTemperatureParam`, `supportsTopPParam` live on the internal model definitions but are not currently exposed on `/models` — treat the request schema below (`instructions`, `temperature`, `top_p`) as the support matrix.

## Voice families (by prefix)

- **Kokoro** — lowercase + language/gender prefix:
  - `af_*`, `am_*` — American female / male
  - `bf_*`, `bm_*` — British female / male
  - `zf_*`, `zm_*` — Chinese
  - `ff_*`, `hf_*`, `hm_*`, `if_*`, `im_*`, `jf_*`, `jm_*`, `pf_*`, `pm_*`, `ef_*`, `em_*` — French, Hindi, Italian, Japanese, Portuguese, Spanish
  - Examples: `af_sky`, `af_bella`, `am_adam`, `bm_george`, `zf_xiaoxiao`
- **Qwen 3** — `Vivian`, `Serena`, `Ono_Anna`, `Sohee`, `Uncle_Fu`, `Dylan`, `Eric`, `Ryan`, `Aiden`
- **xAI** — `eve`, `ara`, `rex`, `sal`, `leo`
- **Orpheus** — `tara`, `leah`, `jess`, `mia`, `zoe`, `dan`, `zac`
- **Inworld** — `Craig`, `Ashley`, `Olivia`, `Sarah`, `Elizabeth`, `Priya`, `Alex`, `Edward`, `Theodore`, `Ronald`, `Mark`, `Hades`, `Luna`, `Pixie`
- **Chatterbox** — `Aurora`, `Britney`, `Siobhan`, `Vicky`, `Blade`, `Carl`, `Cliff`, `Richard`, `Rico`
- **ElevenLabs Turbo** — `Rachel`, `Aria`, `Laura`, `Charlotte`, `Alice`, `Matilda`, `Jessica`, `Lily`, `Roger`, `Charlie`, `George`, `Callum`, `River`, `Liam`, `Will`, `Chris`, `Brian`, `Daniel`, `Bill`
- **MiniMax** — `WiseWoman`, `FriendlyPerson`, `InspirationalGirl`, `CalmWoman`, `LivelyGirl`, `LovelyGirl`, `SweetGirl`, `ExuberantGirl`, `DeepVoiceMan`, `CasualGuy`, `PatientMan`, `YoungKnight`, `DeterminedMan`, `ImposingManner`, `ElegantMan`
- **Gemini 3 Flash** — star names: `Achernar`, `Achird`, `Algenib`, `Algieba`, `Alnilam`, `Aoede`, `Autonoe`, `Callirrhoe`, `Charon`, `Despina`, `Enceladus`, `Erinome`, `Fenrir`, `Gacrux`, `Iapetus`, `Kore`, `Laomedeia`, `Leda`, `Orus`, `Pulcherrima`, `Puck`, `Rasalgethi`, `Sadachbia`, `Sadaltager`, `Schedar`, `Sulafat`, `Umbriel`, `Vindemiatrix`, `Zephyr`, `Zubenelgenubi`

Pass a voice that isn't in the chosen model's list and you get `400`.

## Streaming

```json
{
  "model": "tts-xai-v1",
  "voice": "eve",
  "input": "Hello, this is a long document to narrate. ...",
  "streaming": true,
  "response_format": "mp3"
}
```

With `streaming: true`, the HTTP body is a chunked audio stream. Decode as it arrives — useful for latency-sensitive UIs. `response_format: pcm` pairs well with browser Web Audio API for raw playback.

## OpenAI SDK

```ts
import OpenAI from 'openai'
import fs from 'node:fs/promises'

const client = new OpenAI({
  apiKey