Skill522 repo starsupdated 14d ago

81-pipecat

This Claude Code skill teaches frame-based voice AI development using Pipecat, a composition-focused framework with 40+ provider integrations. Use it to build a production-ready voice agent while learning how Pipecat's pipeline architecture differs from job-based alternatives like LiveKit Agents, progressing from initial setup through optimization guided by hands-on skill development.

View source Repository: claude-skill-registry

Install in Claude Code

Copy

git clone --depth 1 https://github.com/majiayu000/claude-skill-registry /tmp/81-pipecat && cp -r /tmp/81-pipecat/skills/agent/81-pipecat ~/.claude/skills/81-pipecat

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Build Your Pipecat Skill

Before learning Pipecat—a frame-based voice AI framework with 40+ provider integrations—you'll **own** a Pipecat skill.

This is skill-first learning. You build the skill, then the chapter teaches you what it knows and how to make it better. By the end, you have a production-ready voice agent AND a reusable skill for building more.

---

## Why Pipecat?

Pipecat started as an internal framework at Daily.co for building voice bots. After the team saw how well it worked, they open-sourced it in 2024. Since then, it has grown to over 8,900 GitHub stars and supports 40+ AI service integrations.

The framework's key insight: **everything is a frame**. Audio data, text transcriptions, LLM responses, control signals—all frames flowing through a pipeline of processors. This simple abstraction enables powerful composition: swap providers with one line, add custom processing anywhere, deploy to any transport.

**What you're learning**: A compositional approach to voice AI that gives you maximum flexibility.

---

## Pipecat vs LiveKit

You learned LiveKit Agents in Chapter 80. Here's how Pipecat differs:

| Dimension | LiveKit Agents | Pipecat |
|-----------|----------------|---------|
| **Core Abstraction** | Jobs (distributed work) | Frames (data flow) |
| **Architecture** | Workers, Sessions, Agents | Pipelines, Processors, Transports |
| **Provider Strategy** | Curated integrations | Plugin ecosystem (40+) |
| **Transport** | WebRTC-first | Transport-agnostic |
| **Turn Detection** | Semantic (transformer) | Configurable (VAD-based) |

Neither is "better"—they solve different problems. LiveKit excels at enterprise scale and semantic understanding. Pipecat excels at flexibility and rapid iteration.

---

## Step 1: Clone Skills-Lab Fresh

Every chapter starts fresh. No state assumptions.

1. Go to [github.com/panaversity/claude-code-skills-lab](https://github.com/panaversity/claude-code-skills-lab)
2. Click the green **Code** button
3. Select **Download ZIP**
4. Extract the ZIP file
5. Open the extracted folder in your terminal

```bash
cd claude-code-skills-lab
claude
```

**Why fresh?** Skills accumulate across chapters. A fresh start ensures your Pipecat skill builds on clean foundations, not inherited state from Chapter 80.

---

## Step 2: Write Your LEARNING-SPEC.md

Before asking Claude to build anything, define what you want to learn. This is specification-first learning—you specify intent, then the system executes.

Create a new file:

```bash
touch LEARNING-SPEC.md
```

Write your specification:

```markdown
# Pipecat Skill

## What I Want to Learn
Build voice agents using Pipecat's frame-based pipeline architecture—a flexible
alternative to LiveKit that supports 40+ AI service integrations.

## Why This Matters
- Pipecat's frame model enables custom processing anywhere in the pipeline
- 40+ provider integrations means I can optimize for cost, latency, or quality
- Transport-agnostic design means I deploy once, run anywhere
- S2S model support (OpenAI Realtime, Gemini Live) through unified interface

## Success Criteria
- [ ] Create voice pipeline that responds to speech
- [ ] Swap providers without changing pipeline structure
- [ ] Add custom processor for domain-specific logic
- [ ] Configure different transports (WebRTC, WebSocket, local)

## Key Questions I Have
1. How do frames flow through the pipeline?
2. What's the difference between AudioRawFrame and TextFrame?
3. How do I integrate OpenAI Realtime through Pipecat?
4. How do I build a custom processor?
5. What transports are available and when do I use each?

## What I Already Know
- Chapter 80: LiveKit Agents (Agents, Sessions, Workers architecture)
- Chapter 79: Voice AI fundamentals (STT, LLM, TTS pipeline)
- Part 10: Chat interfaces, streaming, WebSocket communication

## What I'm Not Trying to Learn Yet
- Raw OpenAI Realtime API (that's Chapter 82)
- Raw Gemini Live API (that's Chapter 83)
- Phone integration specifics (that's Chapter 84)
```

**Why write a spec?** The AI amplification principle: clear specifications produce excellent results. Your spec focuses the skill on what YOU need, not generic patterns.

---

## Step 3: Fetch Official Documentation

Your skill should be built from official sources, not AI memory. AI memory gets outdated; official docs don't.

Ask Claude:

```
Use the context7 skill to fetch the official Pipecat documentation.
I want to understand:
1. Frame-based architecture (what are frames, how do they flow)
2. Processors and pipelines (how to compose them)
3. Available transports (Daily, WebSocket, local)
4. Provider integrations (STT, LLM, TTS plugins)
5. S2S model support (OpenAI Realtime, Gemini Live)

Save key patterns and code examples for building my skill.
```

Claude will:
1. Connect to Context7 (library documentation service)
2. Fetch current Pipecat docs from GitHub
3. Extract architecture patterns and code examples
4. Prepare knowledge for skill creation

**What you're learning**: Documentation-driven development. The skill you build reflects the framework's current state, not stale training data.

---

## Step 4: Build the Skill

Now create your skill using the documentation Claude just fetched:

```
Using your skill creator skill, create a new skill for Pipecat.
Use the documentation you just fetched from Context7—no self-assumed knowledge.

I will use this skill to build voice agents from hello world to
production systems with custom processing. Focus on:

1. Frame types (AudioRawFrame, TextFrame, EndFrame, control frames)
2. Processor patterns (how to transform frames)
3. Pipeline composition (how to chain processors)
4. Transport configuration (Daily, WebSocket, local)
5. Provider plugins (how to swap STT, LLM, TTS)
6. S2S model integration (OpenAI Realtime, Gemini Live through Pipecat)

Reference my LEARNING-SPEC.md for context on what I want to learn.
```

Claude will:
1. Read your LEARNING-SPEC.md
2. Apply the fetched documentation
3. Ask clarifyi