Skip to main content
ClaudeWave
Skill63 estrellas del repoactualizado 3d ago

pydantic-ai-testing

Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/existential-birds/beagle /tmp/pydantic-ai-testing && cp -r /tmp/pydantic-ai-testing/plugins/beagle-ai/skills/pydantic-ai-testing ~/.claude/skills/pydantic-ai-testing
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Testing PydanticAI Agents

## TestModel (Deterministic Testing)

Use `TestModel` for tests without API calls:

```python
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

def test_agent_basic():
    agent = Agent('openai:gpt-4o')

    # Override with TestModel for testing
    result = agent.run_sync('Hello', model=TestModel())

    # TestModel generates deterministic output based on output_type
    assert isinstance(result.output, str)
```

## TestModel Configuration

```python
from pydantic_ai.models.test import TestModel

# Custom text output
model = TestModel(custom_output_text='Custom response')
result = agent.run_sync('Hello', model=model)
assert result.output == 'Custom response'

# Custom structured output (for output_type agents)
from pydantic import BaseModel

class Response(BaseModel):
    message: str
    score: int

agent = Agent('openai:gpt-4o', output_type=Response)
model = TestModel(custom_output_args={'message': 'Test', 'score': 42})
result = agent.run_sync('Hello', model=model)
assert result.output.message == 'Test'

# Seed for reproducible random output
model = TestModel(seed=42)

# Force tool calls
model = TestModel(call_tools=['my_tool', 'another_tool'])
```

## Override Context Manager

```python
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel

agent = Agent('openai:gpt-4o', deps_type=MyDeps)

def test_with_override():
    mock_deps = MyDeps(db=MockDB())

    with agent.override(model=TestModel(), deps=mock_deps):
        # All runs use TestModel and mock_deps
        result = agent.run_sync('Hello')
        assert result.output
```

## FunctionModel (Custom Logic)

For complete control over model responses:

```python
from pydantic_ai import Agent, ModelMessage, ModelResponse, TextPart
from pydantic_ai.models.function import AgentInfo, FunctionModel

def custom_model(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    """Custom model that inspects messages and returns response."""
    # Access the last user message
    last_msg = messages[-1]

    # Return custom response
    return ModelResponse(parts=[TextPart('Custom response')])

agent = Agent(FunctionModel(custom_model))
result = agent.run_sync('Hello')
```

### FunctionModel with Tool Calls

```python
from pydantic_ai import ToolCallPart, ModelResponse
from pydantic_ai.models.function import AgentInfo, FunctionModel

def model_with_tools(
    messages: list[ModelMessage],
    info: AgentInfo
) -> ModelResponse:
    # First request: call a tool
    if len(messages) == 1:
        return ModelResponse(parts=[
            ToolCallPart(
                tool_name='get_data',
                args='{"id": 123}'
            )
        ])

    # After tool response: return final result
    return ModelResponse(parts=[TextPart('Done with tool result')])

agent = Agent(FunctionModel(model_with_tools))

@agent.tool_plain
def get_data(id: int) -> str:
    return f"Data for {id}"

result = agent.run_sync('Get data')
```

## VCR Cassettes (Recorded API Calls)

Record and replay real LLM API interactions:

```python
import pytest

@pytest.mark.vcr
def test_with_recorded_response():
    """Uses recorded cassette from tests/cassettes/"""
    agent = Agent('openai:gpt-4o')
    result = agent.run_sync('Hello')
    assert 'hello' in result.output.lower()

# To record/update cassettes:
# uv run pytest --record-mode=rewrite tests/test_file.py
```

Cassette files are stored in `tests/cassettes/` as YAML.

## Inline Snapshots

Assert expected outputs with auto-updating snapshots:

```python
from inline_snapshot import snapshot

def test_agent_output():
    result = agent.run_sync('Hello', model=TestModel())

    # First run: creates snapshot
    # Subsequent runs: asserts against it
    assert result.output == snapshot('expected output here')

# Update snapshots:
# uv run pytest --inline-snapshot=fix
```

## Gates: VCR cassettes and inline snapshots

Recording or fixing rewrites files on disk. Follow this sequence; do not skip steps.

1. **Replay pass (no record/fix flags):** Run `uv run pytest` on the target path; **all green** (or failures are understood and unrelated to the artifact you will refresh).
2. **Scope locked:** Identify the cassette under `tests/cassettes/` or the `snapshot(...)` assertion to update; confirm **only** those files should change.
3. **Record or fix:** Run **one** scoped command: `uv run pytest --record-mode=rewrite …` **or** `uv run pytest --inline-snapshot=fix …` for that path only.
4. **Post-condition:** Run the same tests again **without** record/fix flags; **all green**. Inspect `git diff` — only expected `.yaml` / snapshot changes.

If step 4 fails, revert unintended diffs and fix the test or model before re-recording.

## Testing Tools

```python
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.test import TestModel

def test_tool_is_called():
    agent = Agent('openai:gpt-4o')
    tool_called = False

    @agent.tool_plain
    def my_tool(x: int) -> str:
        nonlocal tool_called
        tool_called = True
        return f"Result: {x}"

    # Force TestModel to call the tool
    result = agent.run_sync(
        'Use my_tool',
        model=TestModel(call_tools=['my_tool'])
    )

    assert tool_called
```

## Testing with Dependencies

```python
from dataclasses import dataclass
from unittest.mock import AsyncMock

@dataclass
class Deps:
    api: ApiClient

def test_tool_with_deps():
    # Create mock dependency
    mock_api = AsyncMock()
    mock_api.fetch.return_value = {'data': 'test'}

    agent = Agent('openai:gpt-4o', deps_type=Deps)

    @agent.tool
    async def fetch_data(ctx: RunContext[Deps]) -> dict:
        return await ctx.deps.api.fetch()

    with agent.override(
        model=TestModel(call_tools=['fetch_data']),
        deps=Deps(api=mock_api)
    ):
        result = agent.run_sync('Fetch data')

    mock_api.fetch.assert_called_once()
```

## Capture Messages

Inspe
release-tagSlash Command

tag and push a release after the release PR is merged

releaseSlash Command

create a release PR (auto-detects previous tag)

deepagents-architectureSkill

Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.

deepagents-code-reviewSkill

Reviews Deep Agents code for bugs, anti-patterns, and improvements. Use when reviewing code that uses create_deep_agent, backends, subagents, middleware, or human-in-the-loop patterns. Catches common configuration and usage mistakes.

deepagents-implementationSkill

Implements agents using Deep Agents. Use when building agents with create_deep_agent, configuring backends, defining subagents, adding middleware, or setting up human-in-the-loop workflows.

langgraph-architectureSkill

Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.

langgraph-code-reviewSkill

Reviews LangGraph code for bugs, anti-patterns, and improvements. Use when reviewing code that uses StateGraph, nodes, edges, checkpointing, or other LangGraph features. Catches common mistakes in state management, graph structure, and async patterns.

langgraph-implementationSkill

Implements stateful agent graphs using LangGraph. Use when building graphs, adding nodes/edges, defining state schemas, implementing checkpointing, handling interrupts, or creating multi-agent systems with LangGraph.