Skip to main content
ClaudeWave
Skill1.1k estrellas del repoactualizado today

oma-voice

The oma-voice skill routes text-to-speech, speech-to-text, and voice notification requests to a local Voicebox application via its MCP server, eliminating dependency on cloud vendors. Use it to generate audio files for narration and voiceovers, transcribe local audio recordings to Markdown, trigger voice notifications for task completion, or provide audio infrastructure to other agent skills.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/first-fluke/oh-my-agent /tmp/oma-voice && cp -r /tmp/oma-voice/.agents/skills/oma-voice ~/.claude/skills/oma-voice
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Voice Skill - Local TTS and STT via Voicebox

## Scheduling

### Goal
Drive the Voicebox local app through its MCP server so any MCP-aware agent can speak (TTS) or listen (STT) without invoking cloud vendors. The skill standardizes intent routing, voice profile resolution, output layout, and guardrails while voicebox itself owns the engines, voice cloning UI, captures archive, and stories editor.

### Intent signature
- User asks to generate speech, narrate text, produce a voiceover, create an mp3 or wav from text.
- User wants an audio file transcribed into text, meeting notes, or a transcript.
- User asks for a voice notification when a long task completes or a workflow step is blocked.
- Another skill needs local audio generation infrastructure.

### When to use
- Generating short notification audio for agent task completion or blockers.
- Producing voiceover, narration, or audio assets (mp3 or wav) for apps and content.
- Transcribing local audio files (mp3, wav, m4a, webm, flac) to Markdown.
- Comparing voice profiles by re-running the same text against different profile ids.

### When NOT to use
- Cloud TTS or high-fidelity multilingual cloud voices -> out of scope; future multi-vendor extension.
- Real-time microphone dictation loop in the terminal -> use Voicebox app's built-in hotkey dictation.
- Voice cloning sample upload and profile creation -> done in the Voicebox desktop app UI.
- Video synthesis, music, sound design -> out of scope.
- Stories Editor multi-voice timeline composition -> use the Voicebox app UI.

### Expected inputs
- TTS: text (<= 5000 chars per call), optional profile id, optional engine, optional language, optional output path.
- STT: audio file path (absolute or relative to `$CWD`), optional language hint.
- Notification: short message (<= 240 chars), profile id resolved from config.

### Expected outputs
- TTS: audio file (`mp3` default, `wav` optional) at `.agents/results/voice/{timestamp}-{shortid}/output.{mp3|wav}` plus `manifest.json`.
- STT: `transcript.md` at `.agents/results/voice/transcripts/{timestamp}-{shortid}/` plus `manifest.json`.
- Notification: ephemeral playback through Voicebox; no disk write by default.

### Dependencies
- Voicebox desktop app installed and running locally.
- Voicebox MCP registered (`claude mcp add --transport http voicebox http://127.0.0.1:17493/mcp`).
- At least one voice profile created in the Voicebox app UI.
- Optionally pre-downloaded engine models for the selected profile.

### Control-flow features
- Branches by mode (notify, asset, transcribe), language, and profile availability.
- Calls voicebox via MCP tools, with REST `GET /health` as the handshake probe.
- Reads input audio files and writes generated audio plus manifests.
- Caches discovered MCP tool names after the first successful `tools/list`.

## Structural Flow

### Entry
1. Detect the requested mode: notification, asset TTS, or transcription.
2. Verify Voicebox is reachable via MCP handshake or `GET /health`.
3. On the first run only, call MCP `tools/list` and cache the resolved tool names.
4. Resolve the target voice profile id (notification, asset, or explicit user choice).

### Scenes
1. **PREPARE**: Validate text length, audio duration, language, output path, and profile id.
2. **ACQUIRE**: If a required signal is missing, run the clarification protocol once.
3. **ACT**: Invoke the appropriate MCP tool (TTS or STT) with the resolved parameters.
4. **VERIFY**: Confirm the response carries audio output or transcript content. Validate manifest fields.
5. **FINALIZE**: Write `manifest.json` alongside the output. Report the path or transcript to the user.

### Transitions
- If voicebox is unreachable, surface the install or launch hint and exit. Do not attempt auto-relaunch.
- If `voicebox_list_profiles` is empty, point the user at the Voicebox app UI to create a profile, then exit.
- If a TTS request exceeds 5000 chars, ask whether to truncate or split. Do not auto-chunk in v1.
- If an STT input exceeds 30 minutes, ask whether to proceed. Do not auto-split.
- If the selected engine model is not loaded, ask the user before triggering a download.

### Failure and recovery
| Failure | Recovery |
|---------|----------|
| Voicebox app not running | Print install/launch hint, exit code 5 |
| No voice profile | Print "create a profile in Voicebox" hint, exit code 3 |
| Engine model missing | Ask before triggering download |
| Output path outside `$PWD` | Warn the user, require explicit confirmation |
| TTS over 5000 chars | Ask the user to split or truncate |
| STT over 30 minutes | Ask the user to confirm |
| MCP tool name drift | Re-run `tools/list` and update the cache |
| SIGINT | Abort the MCP call, write no partial output |

### Exit
- Success: audio file or transcript exists with a complete manifest, and the path is reported.
- Partial success: output exists but a guardrail warning is surfaced (length, disk, model fallback).
- Failure: no output, the blocker (auth, profile, engine, network) is explicit.

## Logical Operations

### Actions
| Action | SSL primitive | Evidence |
|--------|---------------|----------|
| Validate mode and inputs | `VALIDATE` | Clarification protocol in execution-protocol.md |
| Resolve voice profile | `SELECT` | `voicebox_list_profiles` + config defaults |
| Health check | `READ` | MCP handshake or `GET /health` |
| Generate speech | `CALL_TOOL` | MCP `voicebox_speak` |
| Transcribe audio | `CALL_TOOL` | MCP `voicebox_transcribe` |
| Write output and manifest | `WRITE` | Audio or transcript plus `manifest.json` |
| Inspect result | `VALIDATE` | Output presence, duration, manifest fields |
| Report result | `NOTIFY` | Final user-facing summary |

### Tools and instruments
- Voicebox MCP server at `http://127.0.0.1:17493/mcp`.
- REST surface for health and audio retrieval (`GET /health`, `GET /audio/{generation_id}`).
- Resource references: voice matrix, prompt tips, execution protocol, checklist.

### Canonical command
oma-academic-writerSkill

>

oma-architectureSkill

Architecture specialist for software/system design, module and service boundaries, tradeoff analysis, and stakeholder synthesis. Uses context-aware methods such as diagnostic routing, design-twice comparison, ATAM-style risk analysis, CBAM-style prioritization, and ADR-style decision records.

oma-backendSkill

Backend specialist for APIs, databases, authentication with clean architecture (Repository/Service/Router pattern). Use for API, endpoint, REST, database, server, migration, and auth work.

oma-brainstormSkill

Design-first ideation that explores user intent, constraints, and approaches before any planning or implementation. Use for brainstorming, ideation, exploring concepts, and evaluating approaches.

oma-coordinationSkill

Guide for coordinating PM, Frontend, Backend, Mobile, and QA agents on complex projects via CLI. Use for manual step-by-step coordination and workflow guidance.

oma-dbSkill

Database specialist for SQL, NoSQL, and vector database modeling, schema design, normalization, indexing, transactions, integrity, concurrency control, backup, capacity planning, data standards, anti-pattern review, and compliance-aware database design. Use for database, schema, ERD, table design, document model, vector index design, RAG retrieval architecture, migration, query tuning, glossary, capacity estimation, backup strategy, database anti-pattern remediation work, and ISO 27001, ISO 27002, or ISO 22301-aware database recommendations.

oma-debugSkill

Bug diagnosis and fixing specialist - analyzes errors, identifies root causes, provides fixes, and writes regression tests. Use for bug, debug, error, crash, traceback, exception, and regression work.

oma-deepsecSkill

>