fish-audio
The fish-audio skill generates expressive audio clips using Fish Audio's S2 Pro text-to-speech API with bracket notation for emotional tags. Use this to create narration, voice memos, announcements, or any spoken content that requires dynamic emotional expression, with support for combining multiple clips through ffmpeg.
git clone --depth 1 https://github.com/vellum-ai/vellum-assistant /tmp/fish-audio && cp -r /tmp/fish-audio/skills/fish-audio ~/.claude/skills/fish-audioSKILL.md
# Fish Audio TTS
Generate expressive audio clips using the Fish Audio S2 TTS API with `[bracket]` emotion tags.
## Overview
This skill lets you create audio clips on demand — narration, announcements, podcast intros, dramatic readings, voice memos, or any spoken content. Uses Fish Audio S2 Pro with the full bracket syntax for emotional expressiveness.
## Configuration
- **API Endpoint:** `https://api.fish.audio/v1/tts`
- **Model:** `s2-pro`
- **Voice Reference ID:** Configured via `assistant config get services.tts.providers.fish-audio.referenceId`
- **API Key:** Stored as credential `fish-audio/api_key`
- **Default Format:** `mp3` at 192kbps
- **Default Output Directory:** `scratch/`
## API Key Setup
The Fish Audio API key must be stored securely via the credential store. Get an API key from the Fish Audio dashboard at https://fish.audio.
Check if the key is already configured:
```bash
assistant credentials inspect --service fish-audio --field api_key --json
```
If not set, collect it securely (never ask the user to paste it in chat):
```
credential_store action="prompt" service="fish-audio" field="api_key" label="Fish Audio API Key" description="Enter your Fish Audio API key" placeholder="sk-..."
```
## Generating a Single Clip
Use `bash` with `curl` to call the Fish Audio API:
```bash
curl -s -X POST "https://api.fish.audio/v1/tts" \
-H "Authorization: Bearer $(assistant credentials reveal --service fish-audio --field api_key)" \
-H "Content-Type: application/json" \
-H "model: s2-pro" \
-d '{
"text": "YOUR TEXT WITH [bracket] TAGS HERE",
"reference_id": "'"$(assistant config get services.tts.providers.fish-audio.referenceId)"'",
"format": "mp3",
"mp3_bitrate": 192,
"temperature": 0.8
}' --output scratch/OUTPUT_FILENAME.mp3
```
**Important:** This API call requires network access. Always use `network_mode: proxied` when running this command.
## Generating Multiple Clips & Combining
For longer pieces (narrations, multi-part messages), generate each clip separately then combine with ffmpeg:
### 1. Generate silence for gaps between clips
```bash
ffmpeg -f lavfi -i anullsrc=r=44100:cl=mono -t 1.5 -q:a 9 -acodec libmp3lame scratch/silence.mp3 -y
```
### 2. Create a concat file
```bash
cat > scratch/concat.txt << 'EOF'
file 'clip1.mp3'
file 'silence.mp3'
file 'clip2.mp3'
file 'silence.mp3'
file 'clip3.mp3'
EOF
```
### 3. Combine
```bash
ffmpeg -f concat -safe 0 -i scratch/concat.txt -c copy scratch/final_output.mp3 -y
```
## Bracket Syntax — Complete Guide
Fish Audio S2 uses `[bracket]` syntax for inline emotion and prosody control. This is the core of what makes the voice expressive. Tags are natural-language instructions placed directly in the text that control how words are spoken — the delivery, emotion, pacing, or vocal quality at that exact point.
**Key principle:** You are not choosing from a fixed menu. You write the description, and S2 interprets it. If you can describe it to a voice actor, S2 can attempt it. Over 15,000+ unique tags are supported, and the system understands free-form descriptions.
### How Placement Works
Tags affect what comes **after** them. Place the tag at the **exact point** where the shift should happen. Placement IS meaning.
```
[whispering] I didn't want to go inside. <- whispers the entire line
I didn't want to go [whispering] inside. <- only whispers from "inside" onward
```
Tags can go **anywhere** — start, middle, or end of a sentence. They apply from the point they appear until the next tag or end of the sentence.
### Well-Tested Tags (Reliable Out of the Box)
These tags consistently produce strong results. Organized by category:
#### Emotions
| Tag | Effect | Best For |
| --------------- | ----------------------- | --------------------------- |
| `[happy]` | Cheerful, upbeat | Good news, greetings |
| `[sad]` | Melancholic, downcast | Sympathy, vulnerability |
| `[angry]` | Frustrated, aggressive | Arguments, complaints |
| `[excited]` | Energetic, enthusiastic | Celebrations, announcements |
| `[surprised]` | Shocked, amazed | Reactions, discoveries |
| `[embarrassed]` | Awkward, flustered | Mistakes, confessions |
| `[delight]` | Very pleased, joyful | Genuine happiness |
| `[nervous]` | Anxious, uncertain | Vulnerability, apologies |
| `[confident]` | Assertive, self-assured | Bold statements |
| `[nostalgic]` | Longing for the past | Memories, stories |
| `[scared]` | Frightened, fearful | Warnings, tension |
| `[jealous]` | Envious, resentful | Comparisons, possessiveness |
| `[shocked]` | Sudden realization | Dramatic reveals |
| `[moved]` | Emotionally touched | Heartfelt moments |
#### Voice Quality & Style
| Tag | Effect | Best For |
| ---------------------- | -------------------- | -------------------------- |
| `[soft]` | Gentle, tender | Intimate moments, kindness |
| `[whisper]` | Very quiet, close | Secrets, tension, suspense |
| `[breathy]` | Airy, expressive | Vulnerability, emphasis |
| `[low voice]` | Deep, quiet register | Gravity, seriousness |
| `[loud]` | Raised volume | Emphasis, excitement |
| `[screaming]` | Full volume yelling | Anger, extreme excitement |
| `[shouting]` | Forceful projection | Arguments, calling out |
| `[emphasis]` | Stressed delivery | Key words, making a point |
| `[singing]` | Musical quality | Playfulness, joy |
| `[echo]` | Reverberant effect | Dramatic moments |
| `[with strong accent]` | Pronounced accent | Character work |
#### Par>
>
>
>
Check Vellum Assistant architecture and package boundaries. Use when editing imports, moving code, adding endpoints, touching assistant/gateway/client/skill boundaries, or reviewing architecture-sensitive changes.
Review Vellum Assistant code changes for correctness, repo-specific quality rules, security risks, and missing validation. Use when reviewing diffs, preparing a PR, finishing implementation work, or when the user asks for a code review, quality pass, or pre-merge check in this repository.
Guide Vellum Assistant feature flag changes and rollout hygiene. Use when adding, editing, reviewing, or documenting assistant feature flags, rollout-gated behavior, or platform flag follow-up work.
Validate Vellum Assistant database and workspace migrations. Use when adding, editing, reviewing, or testing migrations, release-note migrations, persisted schemas, workspace file formats, or data backfills.