81-pipecat
This Claude Code skill teaches frame-based voice AI development using Pipecat, a composition-focused framework with 40+ provider integrations. Use it to build a production-ready voice agent while learning how Pipecat's pipeline architecture differs from job-based alternatives like LiveKit Agents, progressing from initial setup through optimization guided by hands-on skill development.
git clone --depth 1 https://github.com/majiayu000/claude-skill-registry /tmp/81-pipecat && cp -r /tmp/81-pipecat/skills/agent/81-pipecat ~/.claude/skills/81-pipecatSKILL.md
# Build Your Pipecat Skill Before learning Pipecat—a frame-based voice AI framework with 40+ provider integrations—you'll **own** a Pipecat skill. This is skill-first learning. You build the skill, then the chapter teaches you what it knows and how to make it better. By the end, you have a production-ready voice agent AND a reusable skill for building more. --- ## Why Pipecat? Pipecat started as an internal framework at Daily.co for building voice bots. After the team saw how well it worked, they open-sourced it in 2024. Since then, it has grown to over 8,900 GitHub stars and supports 40+ AI service integrations. The framework's key insight: **everything is a frame**. Audio data, text transcriptions, LLM responses, control signals—all frames flowing through a pipeline of processors. This simple abstraction enables powerful composition: swap providers with one line, add custom processing anywhere, deploy to any transport. **What you're learning**: A compositional approach to voice AI that gives you maximum flexibility. --- ## Pipecat vs LiveKit You learned LiveKit Agents in Chapter 80. Here's how Pipecat differs: | Dimension | LiveKit Agents | Pipecat | |-----------|----------------|---------| | **Core Abstraction** | Jobs (distributed work) | Frames (data flow) | | **Architecture** | Workers, Sessions, Agents | Pipelines, Processors, Transports | | **Provider Strategy** | Curated integrations | Plugin ecosystem (40+) | | **Transport** | WebRTC-first | Transport-agnostic | | **Turn Detection** | Semantic (transformer) | Configurable (VAD-based) | Neither is "better"—they solve different problems. LiveKit excels at enterprise scale and semantic understanding. Pipecat excels at flexibility and rapid iteration. --- ## Step 1: Clone Skills-Lab Fresh Every chapter starts fresh. No state assumptions. 1. Go to [github.com/panaversity/claude-code-skills-lab](https://github.com/panaversity/claude-code-skills-lab) 2. Click the green **Code** button 3. Select **Download ZIP** 4. Extract the ZIP file 5. Open the extracted folder in your terminal ```bash cd claude-code-skills-lab claude ``` **Why fresh?** Skills accumulate across chapters. A fresh start ensures your Pipecat skill builds on clean foundations, not inherited state from Chapter 80. --- ## Step 2: Write Your LEARNING-SPEC.md Before asking Claude to build anything, define what you want to learn. This is specification-first learning—you specify intent, then the system executes. Create a new file: ```bash touch LEARNING-SPEC.md ``` Write your specification: ```markdown # Pipecat Skill ## What I Want to Learn Build voice agents using Pipecat's frame-based pipeline architecture—a flexible alternative to LiveKit that supports 40+ AI service integrations. ## Why This Matters - Pipecat's frame model enables custom processing anywhere in the pipeline - 40+ provider integrations means I can optimize for cost, latency, or quality - Transport-agnostic design means I deploy once, run anywhere - S2S model support (OpenAI Realtime, Gemini Live) through unified interface ## Success Criteria - [ ] Create voice pipeline that responds to speech - [ ] Swap providers without changing pipeline structure - [ ] Add custom processor for domain-specific logic - [ ] Configure different transports (WebRTC, WebSocket, local) ## Key Questions I Have 1. How do frames flow through the pipeline? 2. What's the difference between AudioRawFrame and TextFrame? 3. How do I integrate OpenAI Realtime through Pipecat? 4. How do I build a custom processor? 5. What transports are available and when do I use each? ## What I Already Know - Chapter 80: LiveKit Agents (Agents, Sessions, Workers architecture) - Chapter 79: Voice AI fundamentals (STT, LLM, TTS pipeline) - Part 10: Chat interfaces, streaming, WebSocket communication ## What I'm Not Trying to Learn Yet - Raw OpenAI Realtime API (that's Chapter 82) - Raw Gemini Live API (that's Chapter 83) - Phone integration specifics (that's Chapter 84) ``` **Why write a spec?** The AI amplification principle: clear specifications produce excellent results. Your spec focuses the skill on what YOU need, not generic patterns. --- ## Step 3: Fetch Official Documentation Your skill should be built from official sources, not AI memory. AI memory gets outdated; official docs don't. Ask Claude: ``` Use the context7 skill to fetch the official Pipecat documentation. I want to understand: 1. Frame-based architecture (what are frames, how do they flow) 2. Processors and pipelines (how to compose them) 3. Available transports (Daily, WebSocket, local) 4. Provider integrations (STT, LLM, TTS plugins) 5. S2S model support (OpenAI Realtime, Gemini Live) Save key patterns and code examples for building my skill. ``` Claude will: 1. Connect to Context7 (library documentation service) 2. Fetch current Pipecat docs from GitHub 3. Extract architecture patterns and code examples 4. Prepare knowledge for skill creation **What you're learning**: Documentation-driven development. The skill you build reflects the framework's current state, not stale training data. --- ## Step 4: Build the Skill Now create your skill using the documentation Claude just fetched: ``` Using your skill creator skill, create a new skill for Pipecat. Use the documentation you just fetched from Context7—no self-assumed knowledge. I will use this skill to build voice agents from hello world to production systems with custom processing. Focus on: 1. Frame types (AudioRawFrame, TextFrame, EndFrame, control frames) 2. Processor patterns (how to transform frames) 3. Pipeline composition (how to chain processors) 4. Transport configuration (Daily, WebSocket, local) 5. Provider plugins (how to swap STT, LLM, TTS) 6. S2S model integration (OpenAI Realtime, Gemini Live through Pipecat) Reference my LEARNING-SPEC.md for context on what I want to learn. ``` Claude will: 1. Read your LEARNING-SPEC.md 2. Apply the fetched documentation 3. Ask clarifyi
Use when you need to install the embedded robot agents into either .cursor/agents or .claude/agents, selecting the destination interactively and copying the embedded agent definitions from project assets. This should trigger for requests such as Install embedded agents; Bootstrap .cursor/agents; Bootstrap .claude/agents; Copy robot agents. Part of cursor-rules-java project
Use when you need to generate an AGENTS.md file for a Java repository — covering project conventions, tech stack, file structure, commands, Git workflow, and contributor boundaries — through a modular, step-based interactive process that adapts to your specific project needs. This should trigger for requests such as Create AGENTS.md; Update AGENTS.md file; Add agent instructions. Part of cursor-rules-java project
>
Generated skill from request: trinity auto-boot validator
Create your OpenAI Agents SDK skill in one prompt, then learn to improve it throughout the chapter
Create your OpenAI Agents SDK skill in one prompt, then learn to improve it throughout the chapter
Create your Google Agent Development Kit skill in one prompt, then learn to improve it throughout the chapter