Skip to main content
ClaudeWave
Skill63 estrellas del repoactualizado 3d ago

pydantic-ai-model-integration

Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/existential-birds/beagle /tmp/pydantic-ai-model-integration && cp -r /tmp/pydantic-ai-model-integration/plugins/beagle-ai/skills/pydantic-ai-model-integration ~/.claude/skills/pydantic-ai-model-integration
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# PydanticAI Model Integration

## Provider Model Strings

Format: `provider:model-name`

```python
from pydantic_ai import Agent

# OpenAI
Agent('openai:gpt-4o')
Agent('openai:gpt-4o-mini')
Agent('openai:o1-preview')

# Anthropic
Agent('anthropic:claude-sonnet-4-5')
Agent('anthropic:claude-haiku-4-5')

# Google (API Key)
Agent('google-gla:gemini-2.0-flash')
Agent('google-gla:gemini-2.0-pro')

# Google (Vertex AI)
Agent('google-vertex:gemini-2.0-flash')

# Groq
Agent('groq:llama-3.3-70b-versatile')
Agent('groq:mixtral-8x7b-32768')

# Mistral
Agent('mistral:mistral-large-latest')

# Other providers
Agent('cohere:command-r-plus')
Agent('bedrock:anthropic.claude-3-sonnet')
```

## Model Settings

```python
from pydantic_ai import Agent
from pydantic_ai.settings import ModelSettings

agent = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(
        temperature=0.7,
        max_tokens=1000,
        top_p=0.9,
        timeout=30.0,  # Request timeout
    )
)

# Override per-run
result = await agent.run(
    'Generate creative text',
    model_settings=ModelSettings(temperature=1.0)
)
```

## Fallback Models

Chain models for resilience:

```python
from pydantic_ai.models.fallback import FallbackModel

# Try models in order until one succeeds
fallback = FallbackModel(
    'openai:gpt-4o',
    'anthropic:claude-sonnet-4-5',
    'google-gla:gemini-2.0-flash'
)

agent = Agent(fallback)
result = await agent.run('Hello')

# Custom fallback conditions
from pydantic_ai.exceptions import ModelAPIError

def should_fallback(error: Exception) -> bool:
    """Only fallback on rate limits or server errors."""
    if isinstance(error, ModelAPIError):
        return error.status_code in (429, 500, 502, 503)
    return False

fallback = FallbackModel(
    'openai:gpt-4o',
    'anthropic:claude-sonnet-4-5',
    fallback_on=should_fallback
)
```

## Streaming Responses

```python
async def stream_response():
    async with agent.run_stream('Tell me a story') as response:
        # Stream text output
        async for chunk in response.stream_output():
            print(chunk, end='', flush=True)

    # Access final result after streaming
    print(f"\nTokens used: {response.usage().total_tokens}")
```

### Streaming with Structured Output

```python
from pydantic import BaseModel

class Story(BaseModel):
    title: str
    content: str
    moral: str

agent = Agent('openai:gpt-4o', output_type=Story)

async with agent.run_stream('Write a fable') as response:
    # For structured output, stream_output yields partial JSON
    async for partial in response.stream_output():
        print(partial)  # Partial Story object as parsed

    # Final validated result
    story = response.output
```

## Dynamic Model Selection

```python
import os

# Environment-based selection
model = os.getenv('PYDANTIC_AI_MODEL', 'openai:gpt-4o')
agent = Agent(model)

# Runtime model override
result = await agent.run(
    'Hello',
    model='anthropic:claude-sonnet-4-5'  # Override default
)

# Context manager override
with agent.override(model='google-gla:gemini-2.0-flash'):
    result = agent.run_sync('Hello')
```

## Deferred Model Checking

Delay model validation for testing:

```python
# Default: Validates model immediately (checks env vars)
agent = Agent('openai:gpt-4o')

# Deferred: Validates only on first run
agent = Agent('openai:gpt-4o', defer_model_check=True)

# Useful for testing with override
with agent.override(model=TestModel()):
    result = agent.run_sync('Test')  # No OpenAI key needed
```

## Usage Tracking

```python
result = await agent.run('Hello')

# Request usage (last request)
usage = result.usage()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens: {usage.total_tokens}")

# Full run usage (all requests in run)
run_usage = result.run_usage()
print(f"Total requests: {run_usage.requests}")
```

## Usage Limits

```python
from pydantic_ai.usage import UsageLimits

# Limit token usage
result = await agent.run(
    'Generate content',
    usage_limits=UsageLimits(
        total_tokens=1000,
        request_tokens=500,
        response_tokens=500,
    )
)
```

## Provider-Specific Features

### OpenAI

```python
from pydantic_ai.models.openai import OpenAIModel

model = OpenAIModel(
    'gpt-4o',
    api_key='your-key',  # Or use OPENAI_API_KEY env var
    base_url='https://custom-endpoint.com'  # For Azure, proxies
)
```

### Anthropic

```python
from pydantic_ai.models.anthropic import AnthropicModel

model = AnthropicModel(
    'claude-sonnet-4-5',
    api_key='your-key'  # Or ANTHROPIC_API_KEY
)
```

## Common Model Patterns

| Use Case | Recommendation |
|----------|---------------|
| General purpose | `openai:gpt-4o` or `anthropic:claude-sonnet-4-5` |
| Fast/cheap | `openai:gpt-4o-mini` or `anthropic:claude-haiku-4-5` |
| Long context | `anthropic:claude-sonnet-4-5` (200k) or `google-gla:gemini-2.0-flash` |
| Reasoning | `openai:o1-preview` |
| Cost-sensitive prod | `FallbackModel` with fast model first |

## Check gates before ship

Use these only where they prevent obvious misconfiguration; they do not replace integration tests.

- **Fallback chain order:** **Pass:** The first model passed to `FallbackModel(...)` is the intended primary; each subsequent model is a deliberate fallback (not reversed by mistake).
- **Secrets:** **Pass:** Production and shared scripts load API keys from environment variables or a platform secret store; no real keys committed (placeholders only in examples).
release-tagSlash Command

tag and push a release after the release PR is merged

releaseSlash Command

create a release PR (auto-detects previous tag)

deepagents-architectureSkill

Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.

deepagents-code-reviewSkill

Reviews Deep Agents code for bugs, anti-patterns, and improvements. Use when reviewing code that uses create_deep_agent, backends, subagents, middleware, or human-in-the-loop patterns. Catches common configuration and usage mistakes.

deepagents-implementationSkill

Implements agents using Deep Agents. Use when building agents with create_deep_agent, configuring backends, defining subagents, adding middleware, or setting up human-in-the-loop workflows.

langgraph-architectureSkill

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

langgraph-code-reviewSkill

Reviews LangGraph code for bugs, anti-patterns, and improvements. Use when reviewing code that uses StateGraph, nodes, edges, checkpointing, or other LangGraph features. Catches common mistakes in state management, graph structure, and async patterns.

langgraph-implementationSkill

Implements stateful agent graphs using LangGraph. Use when building graphs, adding nodes/edges, defining state schemas, implementing checkpointing, handling interrupts, or creating multi-agent systems with LangGraph.