Skill522 repo starsupdated 14d ago

80-livekit-agents-panaversity-agentfactory-1f8ecf89

This skill guides developers through building production-ready voice agents using LiveKit Agents, the open source framework powering ChatGPT's Advanced Voice Mode. Use this when learning to create voice-driven AI products that handle real-time speech recognition, function calling, and interruption handling with the same architecture deployed at enterprise scale.

View source Repository: claude-skill-registry

Install in Claude Code

Copy

git clone --depth 1 https://github.com/majiayu000/claude-skill-registry /tmp/80-livekit-agents-panaversity-agentfactory-1f8ecf89 && cp -r /tmp/80-livekit-agents-panaversity-agentfactory-1f8ecf89/skills/agent/80-livekit-agents-panaversity-agentfactory-1f8ecf89 ~/.claude/skills/80-livekit-agents-panaversity-agentfactory-1f8ecf89

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Build Your LiveKit Agents Skill

Before learning LiveKit Agents—the framework powering ChatGPT's Advanced Voice Mode—you'll **own** a LiveKit Agents skill.

This is skill-first learning. You build the skill, then the chapter teaches you what it knows and how to make it better. By the end, you have a production-ready voice agent AND a reusable skill for building more.

---

## Why LiveKit Agents?

In September 2023, OpenAI unveiled ChatGPT Voice Mode. The technology behind it? LiveKit. When OpenAI launched the feature, they also released LiveKit Agents—an open source framework that made it easy for developers to build their own voice AI agents.

LiveKit Agents was used in every demo during the GPT-4o unveil. The framework now powers voice-driven AI products across the industry—from startups to enterprises building Digital FTEs that can hear, speak, and reason in realtime.

**What you're learning**: Production voice agent architecture from the framework that runs at scale.

---

## Step 1: Clone Skills-Lab Fresh

Every chapter starts fresh. No state assumptions.

1. Go to [github.com/panaversity/claude-code-skills-lab](https://github.com/panaversity/claude-code-skills-lab)
2. Click the green **Code** button
3. Select **Download ZIP**
4. Extract the ZIP file
5. Open the extracted folder in your terminal

```bash
cd claude-code-skills-lab
claude
```

**Why fresh?** Skills accumulate across chapters. A fresh start ensures your LiveKit skill builds on clean foundations, not inherited state.

---

## Step 2: Write Your LEARNING-SPEC.md

Before asking Claude to build anything, define what you want to learn. This is specification-first learning—you specify intent, then the system executes.

Create a new file:

```bash
touch LEARNING-SPEC.md
```

Write your specification:

```markdown
# LiveKit Agents Skill

## What I Want to Learn
Build voice agents using LiveKit's production framework—the same technology
powering ChatGPT's Advanced Voice Mode.

## Why This Matters
- LiveKit Agents handles the hard parts: WebRTC, turn detection, interruptions
- Understanding the framework means understanding what works at scale
- Every voice-enabled Digital FTE I build will use these patterns

## Success Criteria
- [ ] Create voice agent that responds to speech
- [ ] Implement function calling (tool use via voice)
- [ ] Handle interruptions gracefully (barge-in)
- [ ] Understand deployment to Kubernetes

## Key Questions I Have
1. How do Agents, AgentSessions, and Workers relate to each other?
2. How does semantic turn detection work? Why is it better than silence-based?
3. How do I integrate MCP tools into a voice agent?
4. What's the difference between VoicePipelineAgent and MultimodalAgent?
5. How do I handle phone calls (SIP integration)?

## What I Already Know
- Part 10: Chat interfaces, streaming, tool calling UI
- Part 7: Kubernetes deployment, containerization
- Part 6: Agent SDKs (OpenAI, Claude, Google ADK)

## What I'm Not Trying to Learn Yet
- Pipecat (that's Chapter 81)
- Raw OpenAI Realtime API (that's Chapter 82)
- Phone number provisioning details (that's Chapter 84)
```

**Why write a spec?** The AI amplification principle: clear specifications produce excellent results. Vague requests produce confident-looking output that's wrong in subtle ways.

---

## Step 3: Fetch Official Documentation

Your skill should be built from official sources, not AI memory. AI memory gets outdated; official docs don't.

Ask Claude:

```
Use the context7 skill to fetch the official LiveKit Agents documentation.
I want to understand:
1. Core concepts (Agents, Sessions, Workers)
2. VoicePipelineAgent vs MultimodalAgent
3. Turn detection and interruption handling
4. Function calling and tool integration
5. Deployment patterns

Save key patterns and code examples for building my skill.
```

Claude will:
1. Connect to Context7 (library documentation service)
2. Fetch current LiveKit Agents docs
3. Extract architecture patterns and code examples
4. Prepare knowledge for skill creation

**What you're learning**: Documentation-driven development. The skill you build reflects the framework's current state, not stale training data.

---

## Step 4: Build the Skill

Now create your skill using the documentation Claude just fetched:

```
Using your skill creator skill, create a new skill for LiveKit Agents.
Use the documentation you just fetched from Context7—no self-assumed knowledge.

I will use this skill to build voice agents from hello world to
production systems that handle real phone calls. Focus on:

1. VoicePipelineAgent patterns (STT -> LLM -> TTS pipeline)
2. MultimodalAgent patterns (for Gemini Live, OpenAI Realtime)
3. Semantic turn detection configuration
4. Function calling via voice
5. Kubernetes deployment with Workers

Reference my LEARNING-SPEC.md for context on what I want to learn.
```

Claude will:
1. Read your LEARNING-SPEC.md
2. Apply the fetched documentation
3. Ask clarifying questions (interruption policies, STT/TTS providers, deployment targets)
4. Create the complete skill with references and templates

Your skill appears at `.claude/skills/livekit-agents/`.

---

## Step 5: Verify It Works

Test your skill with a simple prompt:

```
Using the livekit-agents skill, create a minimal voice agent that:
1. Listens for speech
2. Responds with "Hello, I heard you say: [transcription]"
3. Uses Deepgram for STT and Cartesia for TTS

Just the code, no explanation.
```

If your skill works, Claude generates a working agent skeleton. If it doesn't, Claude asks for clarification—which tells you what's missing from your skill.

**Expected output structure**:

```python
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, cartesia, openai

async def entrypoint(ctx: JobContext):
# Your agent implementation
...

if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=ent