Skip to main content
ClaudeWave
Skill730 estrellas del repoactualizado 15d ago

livekit-skills

The livekit-skills item provides guidance for building voice AI agents using the LiveKit Agents SDK, covering both cloud-hosted and self-managed deployments. Use this skill when a user requests building voice agents, creating LiveKit agents, adding voice AI capabilities, implementing handoffs, structuring agent workflows, or working directly with the LiveKit Agents SDK framework and associated tools.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/fcakyon/claude-codex-settings /tmp/livekit-skills && cp -r /tmp/livekit-skills/plugins/livekit-skills/skills/livekit-skills ~/.claude/skills/livekit-skills
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# LiveKit Voice Agent Development

This skill provides guidance for building voice AI agents with the LiveKit Agents SDK. It covers both LiveKit Cloud and self-hosted deployments, using the `lk` CLI for documentation access and project management. All factual information about APIs, methods, and configurations must come from live documentation.


## MANDATORY: Read This Checklist Before Starting

Before writing ANY code, complete this checklist:

1. **Read this entire skill document** - Do not skip sections
2. **Set up LiveKit credentials** (Cloud project or self-hosted server) - You need `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET`
3. **Set up documentation access** - Install `lk` CLI for `lk docs` commands
4. **Plan to write tests** - Every agent implementation MUST include tests (see testing section below)
5. **Verify all APIs against live docs** - Never rely on model memory for LiveKit APIs


## Setup

### LiveKit Cloud

LiveKit Cloud is the fastest way to get a voice agent running. It provides:
- Managed infrastructure (no servers to deploy)
- **LiveKit Inference** for AI models (no separate API keys needed)
- Built-in noise cancellation, turn detection, and other voice features
- Simple credential management

### Connect to Your Cloud Project

1. Sign up at [cloud.livekit.io](https://cloud.livekit.io) if you haven't already
2. Create a project (or use an existing one)
3. Get your credentials from the project settings:
   - `LIVEKIT_URL` - Your project's WebSocket URL (e.g., `wss://your-project.livekit.cloud`)
   - `LIVEKIT_API_KEY` - API key for authentication
   - `LIVEKIT_API_SECRET` - API secret for authentication

4. Set these as environment variables (typically in `.env.local`):
```bash
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
```

The LiveKit CLI can automate credential setup. Consult the CLI documentation for current commands.

### Use LiveKit Inference for AI Models

LiveKit Inference is one option for AI model access when using LiveKit Cloud. It provides access to leading AI model providers—all through your LiveKit credentials with no separate API keys needed.

Benefits of LiveKit Inference:
- No separate API keys to manage for each AI provider
- Billing consolidated through your LiveKit Cloud account
- Optimized for voice AI workloads

Consult the documentation for available models, supported providers, and current usage patterns. The documentation always has the most up-to-date information.

### Self-Hosted Setup

Self-hosting removes Cloud tier limits on deployments and concurrency. You control scaling directly.

#### Local development
Install and run the LiveKit server:
- macOS: `brew install livekit`
- Linux: `curl -sSL https://get.livekit.io | bash`

Start in dev mode:
```bash
livekit-server --dev
```
Default credentials: API key `devkey`, API secret `secret`.

Set environment variables:
```bash
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret
```

#### Production deployment
Deploy `livekit-server` via Docker, Kubernetes, or VMs on any provider (Hetzner, AWS, GCP, etc.). Consult `lk docs get-page /home/self-hosting` or see `references/self-hosting.md` for details. Agent servers run as regular processes managed by your infra tooling.

### Using Your Own Model Providers

When self-hosting or when you prefer your own API keys over LiveKit Inference, configure model providers directly via environment variables:

```bash
# STT (Speech-to-Text)
DEEPGRAM_API_KEY=your-key

# LLM
OPENAI_API_KEY=your-key

# TTS (Text-to-Speech)
ELEVEN_API_KEY=your-key
# or
CARTESIA_API_KEY=your-key
```

The Agents SDK has plugins for all major providers. Pass model identifiers directly:

**Node.js / TypeScript:**
```typescript
import { voice } from "@livekit/agents";

const session = new voice.AgentSession({
  stt: "deepgram/nova-3:multi",
  llm: "openai/gpt-4.1-mini",
  tts: "cartesia/sonic-3:voice-id",  // or "elevenlabs/..."
});
```

**Python:**
```python
session = AgentSession(
    stt="deepgram/nova-3",
    llm="openai/gpt-4.1-mini",
    tts="elevenlabs/...",  # or "cartesia/sonic-3:voice-id"
)
```

Consult `lk docs search "plugins"` for the full list of supported providers.

### Project Templates

Initialize a new agent project with the CLI:

**Backend agents:**
```bash
lk agent init my-agent --template agent-starter-python
lk agent init my-agent --template agent-starter-node
```

**Frontend apps (React/Next.js, React Native, Swift, Flutter, Android):**
```bash
lk agent init my-frontend --template agent-starter-react
lk agent init my-frontend --template agent-starter-react-native
```

Omit `--template` to see all available templates interactively.

## Critical Rule: Never Trust Model Memory for LiveKit APIs

LiveKit Agents is a fast-evolving SDK. Model training data is outdated the moment it's created. When working with LiveKit:

- **Never assume** API signatures, method names, or configuration options from memory
- **Never guess** SDK behavior or default values
- **Always verify** against live documentation before writing code
- **Always cite** the documentation source when implementing features

This rule applies even when confident about an API. Verify anyway.

## Use LiveKit CLI for Documentation

Before writing any LiveKit code, use the `lk docs` CLI commands for current, verified API information. This prevents reliance on stale model knowledge.

### Search documentation
```bash
lk docs search "voice agent quickstart"
lk docs search "handoffs and tasks"
```

### Fetch specific pages
```bash
lk docs get-page /agents/start/voice-ai-quickstart
lk docs get-page /agents/build/tools /agents/build/vision
```

### Search SDK source code
```bash
lk docs code-search "class AgentSession" --repo livekit/agents
lk docs code-search "@function_tool" --language Python --full-file
```

### Check changelogs
```bash
lk docs changelog livekit/agents
lk docs changelog pypi:livekit-agents --
agent-browserSkill

Agent-browser usage guide. Read this before running any agent-browser commands. Covers the snapshot-and-ref workflow, navigating pages, interacting with elements (click, fill, type, select), extracting text and data, taking screenshots, managing tabs, handling forms and auth, waiting for content, running multiple browser sessions in parallel, and troubleshooting common failures. Use when the user asks to interact with a website, fill a form, click something, extract data, take a screenshot, log into a site, test a web app, or automate any browser task.

electronSkill

Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "automate Slack app", "control VS Code", "interact with Discord app", "test this Electron app", "connect to desktop app", or any task requiring automation of a native Electron application.

docxSkill

Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.

pdfSkill

Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.

pptxSkill

Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill.

xlsxSkill

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

azure-usageSkill

This skill should be used when user asks to "query Azure resources", "list storage accounts", "manage Key Vault secrets", "work with Cosmos DB", "check AKS clusters", "use Azure MCP", or interact with any Azure service.

setupSkill

This skill should be used when user encounters "Tavily MCP error", "Tavily API key invalid", "web search not working", "Tavily failed", or needs help configuring Tavily integration.