Skip to main content
ClaudeWave
Skill92 repo starsupdated 1mo ago

venice-audio-speech

Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/veniceai/skills /tmp/venice-audio-speech && cp -r /tmp/venice-audio-speech/skills/venice-audio-speech ~/.claude/skills/venice-audio-speech
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Venice TTS (`/audio/speech`)

`POST /api/v1/audio/speech` converts text to an audio stream or file. OpenAI-compatible — the OpenAI SDK's `audio.speech.create()` works as a drop-in.

## Use when

- You want narration, voice replies, or UI audio from text.
- You need a specific voice family (ElevenLabs, Kokoro, xAI, Qwen 3, Orpheus, Chatterbox, MiniMax, Inworld, Gemini Flash).
- You want streaming audio returned sentence-by-sentence.
- You need style/emotion control on supported models.

For music generation (lyrics + instrumental), see [`venice-audio-music`](../venice-audio-music/SKILL.md). For transcription (audio → text), see [`venice-audio-transcription`](../venice-audio-transcription/SKILL.md).

## Minimal request

```bash
curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-xai-v1",
    "voice": "eve",
    "input": "Hello, welcome to Venice Voice.",
    "response_format": "mp3",
    "speed": 1.0,
    "streaming": false
  }' --output hello.mp3
```

Response is the raw audio (`Content-Type` matches `response_format`).

## Request schema

| Field | Type | Default | Notes |
|---|---|---|---|
| `input` | string | — | **Required.** Up to **4096** characters. |
| `model` | enum | `tts-kokoro` (OpenAPI schema default) | See model list below. `tts-xai-v1` is the recommended frontier default; pick the model that fits your voice + language needs. |
| `voice` | enum | model-specific (e.g. `eve` for `tts-xai-v1`) | **Voice is model-specific** — wrong combo = `400`. See voice families. |
| `response_format` | `mp3` / `opus` / `aac` / `flac` / `wav` / `pcm` | `mp3` | `pcm` returns 24 kHz signed-16 LE for pipelines. |
| `speed` | number | `1.0` | Range `0.25–4.0`. |
| `streaming` | bool | `false` | `true` → streamed sentence-by-sentence as audio continues to generate. |
| `language` | string | — | Optional hint. Accepted form depends on model (Qwen 3 = full names like `English`; xAI / ElevenLabs = ISO 639-1 like `en`; MiniMax = full names). Unsupported values silently ignored. |
| `prompt` | string, ≤ 500 | — | Emotion / style cue. Only for models with `supportsPromptParam` (Qwen 3 currently). Examples: *"Very happy."*, *"Sad and slow."*. |
| `temperature` | 0–2 | — | Sampling temperature. Only for models with `supportsTemperatureParam` (Qwen 3, Orpheus, Chatterbox HD). |
| `top_p` | 0–1 | — | Only Qwen 3 currently. |

## Models

| Model ID | Family | Highlights |
|---|---|---|
| `tts-xai-v1` | xAI | **Recommended default.** Conversational style, ISO 639-1 language hints. |
| `tts-kokoro` | Kokoro | OpenAPI schema default. Multilingual, many voices across languages. |
| `tts-qwen3-0-6b` / `tts-qwen3-1-7b` | Qwen 3 | Emotion control via `prompt`, temperature, top_p. |
| `tts-inworld-1-5-max` | Inworld | Character-driven voices (Craig, Ashley, …). |
| `tts-chatterbox-hd` | Chatterbox | HD voices (Aurora, Blade, …), temperature. |
| `tts-orpheus` | Orpheus | Conversational (tara, leah, jess, leo, …), temperature. |
| `tts-elevenlabs-turbo-v2-5` | ElevenLabs Turbo | Rachel, Aria, Charlotte, Roger, … |
| `tts-minimax-speech-02-hd` | MiniMax | WiseWoman, DeepVoiceMan, … |
| `tts-gemini-3-1-flash` | Gemini Flash | Star-named voices (Achernar, Achird, Zephyr, …). |

Always inspect the entry for your model in `GET /models?type=tts` — `model_spec.voices` is the authoritative voice list. Per-model toggles like `supportsPromptParam`, `supportsTemperatureParam`, `supportsTopPParam` live on the internal model definitions but are not currently exposed on `/models` — treat the request schema below (`instructions`, `temperature`, `top_p`) as the support matrix.

## Voice families (by prefix)

- **Kokoro** — lowercase + language/gender prefix:
  - `af_*`, `am_*` — American female / male
  - `bf_*`, `bm_*` — British female / male
  - `zf_*`, `zm_*` — Chinese
  - `ff_*`, `hf_*`, `hm_*`, `if_*`, `im_*`, `jf_*`, `jm_*`, `pf_*`, `pm_*`, `ef_*`, `em_*` — French, Hindi, Italian, Japanese, Portuguese, Spanish
  - Examples: `af_sky`, `af_bella`, `am_adam`, `bm_george`, `zf_xiaoxiao`
- **Qwen 3** — `Vivian`, `Serena`, `Ono_Anna`, `Sohee`, `Uncle_Fu`, `Dylan`, `Eric`, `Ryan`, `Aiden`
- **xAI** — `eve`, `ara`, `rex`, `sal`, `leo`
- **Orpheus** — `tara`, `leah`, `jess`, `mia`, `zoe`, `dan`, `zac`
- **Inworld** — `Craig`, `Ashley`, `Olivia`, `Sarah`, `Elizabeth`, `Priya`, `Alex`, `Edward`, `Theodore`, `Ronald`, `Mark`, `Hades`, `Luna`, `Pixie`
- **Chatterbox** — `Aurora`, `Britney`, `Siobhan`, `Vicky`, `Blade`, `Carl`, `Cliff`, `Richard`, `Rico`
- **ElevenLabs Turbo** — `Rachel`, `Aria`, `Laura`, `Charlotte`, `Alice`, `Matilda`, `Jessica`, `Lily`, `Roger`, `Charlie`, `George`, `Callum`, `River`, `Liam`, `Will`, `Chris`, `Brian`, `Daniel`, `Bill`
- **MiniMax** — `WiseWoman`, `FriendlyPerson`, `InspirationalGirl`, `CalmWoman`, `LivelyGirl`, `LovelyGirl`, `SweetGirl`, `ExuberantGirl`, `DeepVoiceMan`, `CasualGuy`, `PatientMan`, `YoungKnight`, `DeterminedMan`, `ImposingManner`, `ElegantMan`
- **Gemini 3 Flash** — star names: `Achernar`, `Achird`, `Algenib`, `Algieba`, `Alnilam`, `Aoede`, `Autonoe`, `Callirrhoe`, `Charon`, `Despina`, `Enceladus`, `Erinome`, `Fenrir`, `Gacrux`, `Iapetus`, `Kore`, `Laomedeia`, `Leda`, `Orus`, `Pulcherrima`, `Puck`, `Rasalgethi`, `Sadachbia`, `Sadaltager`, `Schedar`, `Sulafat`, `Umbriel`, `Vindemiatrix`, `Zephyr`, `Zubenelgenubi`

Pass a voice that isn't in the chosen model's list and you get `400`.

## Streaming

```json
{
  "model": "tts-xai-v1",
  "voice": "eve",
  "input": "Hello, this is a long document to narrate. ...",
  "streaming": true,
  "response_format": "mp3"
}
```

With `streaming: true`, the HTTP body is a chunked audio stream. Decode as it arrives — useful for latency-sensitive UIs. `response_format: pcm` pairs well with browser Web Audio API for raw playback.

## OpenAI SDK

```ts
import OpenAI from 'openai'
import fs from 'node:fs/promises'

const client = new OpenAI({
  apiKey
venice-api-keysSkill

Manage Venice API keys. Covers GET/POST/PATCH/DELETE /api_keys, GET /api_keys/{id}, GET /api_keys/rate_limits, GET /api_keys/rate_limits/log, the two-step /api_keys/generate_web3_key wallet flow, INFERENCE vs ADMIN key types, and per-key consumption limits (USD / DIEM).

venice-api-overviewSkill

High-level map of the Venice.ai API - base URL, authentication modes, endpoint categories, response headers, pricing model, error shape, and versioning. Load this first when starting any Venice integration.

venice-audio-musicSkill

Async music / audio-track generation via Venice. Covers the /audio/quote + /audio/queue + /audio/retrieve + /audio/complete lifecycle, lyrics vs instrumental, voice selection, duration, language, speed, model capability probing, and webhook-free polling.

venice-audio-transcriptionSkill

Transcribe audio files to text via POST /audio/transcriptions. Covers supported models (Parakeet, Whisper, Wizper, Scribe, xAI STT), supported formats (wav/flac/m4a/aac/mp4/mp3/ogg/webm), response formats (json/text), timestamps, and language hints. OpenAI-compatible multipart.

venice-augmentSkill

Venice augmentation endpoints for agent pipelines. Covers POST /augment/text-parser (extract text from PDF/DOCX/XLSX/plain text, multipart, up to 25MB, JSON or plain text response), POST /augment/scrape (fetch a URL and return markdown; blocks X/Reddit), and POST /augment/search (Brave ZDR or anonymized Google; structured title/url/content/date results, up to 20 per query). Privacy (zero data retention), rate limits, and error shapes.

venice-authSkill

Authenticate to the Venice API with a Bearer API key or with an x402 / SIWE wallet. Covers header formats, the SIWE message fields, TTL and nonce rules, the venice-x402-client SDK, and how to choose between the two modes.

venice-billingSkill

Venice billing and usage analytics - GET /billing/balance, GET /billing/usage (paginated per-request ledger, JSON or CSV), and GET /billing/usage-analytics (aggregated by date/model/key). Covers the DIEM/USD/BUNDLED_CREDITS consumption priority and building dashboards. (Beta)

venice-charactersSkill

Discover and use Venice public characters (persona-driven system prompts with a bound model). Covers GET /characters (search/filter/sort), /characters/{slug}, /characters/{slug}/reviews, the Character schema, and how to apply a character via venice_parameters.character_slug in chat completions.