Skip to main content
ClaudeWave
Skill462 repo starsupdated 3d ago

Real Pytest - No Mocks, Real Tests

# Real Pytest - No Mocks, Real Tests This Claude Code skill provides patterns for writing integration tests against actual public interfaces without mocking components. Use when developing or reviewing pytest test suites to ensure tests verify real module contracts rather than implementation details, enforce critical guarantees, and fail when code breaks rather than mirror existing behavior.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/taylorsatula/mira-OSS /tmp/real-pytest---no-mocks-real-tests && cp -r /tmp/real-pytest---no-mocks-real-tests/.claude/skills/pytest-real-testing ~/.claude/skills/real-pytest---no-mocks-real-tests
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Real Testing Philosophy

## CRITICAL MINDSET SHIFT

**Tests that verify implementation are worse than no tests** - they provide false confidence while catching nothing.

**Your job is not to confirm the code works. Your job is to:**
1. **Think critically about the contract** - what SHOULD this module do?
2. **Surface design problems** - is this module papering over architectural failures?
3. **Write tests that enforce guarantees** - not tests that mirror implementation
4. **Prove tests can fail** - see them fail first, verify failure modes are correct

Tests that always pass are actively harmful. They waste time and provide false security.

### 🚨 NEVER SKIP TESTS

**ABSOLUTE RULE: Do NOT use `@pytest.mark.skip`, `@pytest.mark.skipif`, or `pytest.skip()`**

Tests either:
- ✅ **PASS** - the code works correctly
- ❌ **FAIL** - the code is broken and needs fixing

There is no third state. Skipped tests are:
- Technical debt pretending to be documentation
- Broken code that someone gave up on
- False confidence in test coverage metrics

**If a test can't run:**
- Fix the environment/dependencies so it can run
- Fix the code so the test passes
- Delete the test if it's testing something that doesn't exist

**NEVER commit a skipped test.** Either make it pass or delete it.

---

## PHASE 1: Contract-First Analysis (DO THIS FIRST)

**NEVER write tests by reading implementation.** That's how you write tests that mirror what code does instead of what it should do.

### Protocol: Analyze Contract Without Reading Implementation

**Step 1: Read ONLY the module's public interface**
```python
# Read THIS (public interface)
class ReminderTool:
    def run(self, operation: str, **kwargs) -> Dict[str, Any]:
        """Execute reminder operations."""
        pass

# DO NOT read implementation details
# DO NOT look at internal methods
# DO NOT read how it's implemented
```

**Step 2: Document the contract**

Before writing any test, answer these questions in writing:

```
MODULE CONTRACT ANALYSIS
========================

1. What is this module's PURPOSE?
   - What problem does it solve?
   - Why does it exist?

2. What GUARANTEES does it provide?
   - What promises does the API make?
   - What invariants must hold?
   - What post-conditions are guaranteed?

3. What should SUCCEED?
   - Valid inputs
   - Happy path scenarios
   - Boundary cases that should work

4. What should FAIL?
   - Invalid inputs
   - Boundary conditions that should error
   - Security violations
   - Resource constraints

5. What are the DEPENDENCIES?
   - What does this module depend on?
   - Are there too many dependencies?
   - Could this be simpler?

6. ARCHITECTURAL CONCERNS:
   - Is this module doing too much?
   - Is it papering over design failures elsewhere?
   - Does the contract make sense or is it convoluted?
   - Should this module even exist?
```

**Step 3: Design test cases from contract**

Based on contract analysis (NOT implementation):
- List positive test cases (what should work)
- List negative test cases (what should fail)
- List boundary conditions
- List security concerns
- List performance concerns

**See "CANONICAL EXAMPLE" section below for complete contract analysis walkthrough.**

---

## PHASE 1.5: Contract Verification (VALIDATE YOUR ASSUMPTIONS)

**CRITICAL**: Do NOT read the implementation file yourself. Use the contract-extractor agent as an abstraction barrier.

### Why This Phase Exists

You've formed expectations about the contract from the interface. Now verify those expectations against actual implementation WITHOUT seeing the implementation yourself. The agent reads the code and reports ONLY contract facts (not implementation details).

### Protocol: Invoke Agent → Compare → Identify Gaps

**Step 1: Invoke the contract-extractor agent**

```bash
# Use Task tool to invoke the agent
Task(
    subagent_type="contract-extractor",
    description="Extract contract from module",
    prompt="""Extract the contract from: path/to/module.py

Return:
- Public interface (methods, signatures, types)
- Actual return structures (dict keys, types)
- Exception contracts (what raises what, when)
- Edge cases handled
- Dependencies and architectural concerns"""
)
```

**Step 2: Compare your expectations against agent report**

Create a comparison:

```
EXPECTATION vs REALITY
======================

Expected return structure:
{
    "status": str,
    "results": list
}

Actual return structure (from agent):
{
    "status": str,
    "confidence": float,  # I MISSED THIS
    "results": list,
    "result_count": int   # I MISSED THIS
}

Expected exceptions:
- ValueError for empty query

Actual exceptions (from agent):
- ValueError for empty query ✓
- ValueError for negative max_results  # I MISSED THIS

Expected edge cases:
- Empty results returns []

Actual edge cases (from agent):
- Empty results returns status="low_confidence", confidence=0.0, results=[]
  # More nuanced than I expected
```

**Step 3: Identify discrepancies and their implications**

For each discrepancy, ask:
- Is the code **wrong** (doesn't match intended contract)?
- Is the contract **unclear** (missing documentation)?
- Did I **misunderstand** the requirements?
- Is this an **undocumented feature** (needs test)?

**Example Analysis**:
```
DISCREPANCY: Agent reports confidence field in return, I didn't expect it
IMPLICATION: This is part of the contract - add test to verify confidence in [0.0, 1.0]

DISCREPANCY: Agent reports ValueError for negative max_results, I didn't expect it
IMPLICATION: Good edge case handling - add negative test

DISCREPANCY: Agent reports 8 dependencies, I expected 3-4
IMPLICATION: ARCHITECTURAL CONCERN - too many deps, report to human
```

**Step 4: Update test plan based on verified contract**

Now you know:
- What the code actually returns (test these exact structures)
- What exceptions are actually raised (test these exact cases)
- What edge cases are actually handled (test these behaviors)
- What architectural pr
code-renamerSubagent

Use this agent when you need to rename classes, methods, functions, or variables in code files to align with specific naming requirements or conventions. Examples: <example>Context: User wants to clean up function names by removing a specific prefix. user: 'Please remove the prefix get_ from all function names in this file' assistant: 'I'll use the code-renamer agent to systematically rename all functions by removing the get_ prefix' <commentary>The user wants systematic renaming of functions, which is exactly what the code-renamer agent is designed for.</commentary></example> <example>Context: User wants to standardize method naming conventions. user: 'Can you rename all the camelCase methods to snake_case in this class?' assistant: 'I'll use the code-renamer agent to convert all camelCase method names to snake_case convention' <commentary>This is a systematic renaming task that requires careful attention to naming conventions.</commentary></example>

investigative-sidekickSubagent

Use this agent when the user makes offhanded comments, rhetorical questions, or expresses wishes about understanding something better. Trigger on patterns like:\n\n<example>\nContext: User is reviewing code and sees an assistant's explanation about how a function works.\nuser: "Can you believe this? Is this even right?"\nassistant: "Let me use the investigative-sidekick agent to verify if that explanation is accurate."\n<task tool_call to investigative-sidekick with context about what needs verification>\n</example>\n\n<example>\nContext: User is debugging and expresses frustration.\nuser: "I wish I could figure out what's causing this memory leak in the session handler"\nassistant: "I'll use the investigative-sidekick agent to investigate the root cause of that memory leak."\n<task tool_call to investigative-sidekick with the specific problem to investigate>\n</example>\n\n<example>\nContext: User reads a commit message claiming a performance improvement.\nuser: "Did this actually make things faster though?"\nassistant: "Let me launch the investigative-sidekick agent to verify that performance claim."\n<task tool_call to investigative-sidekick to fact-check the performance assertion>\n</example>\n\n<example>\nContext: User is reviewing documentation that seems questionable.\nuser: "This doesn't seem right - are we really supposed to use sync calls in async contexts?"\nassistant: "I'm going to use the investigative-sidekick agent to investigate whether that's actually correct."\n<task tool_call to investigative-sidekick to verify the technical claim>\n</example>\n\nActivate proactively when the user:\n- Questions accuracy or truthfulness ("Can you believe...", "Is this right?", "Really?")\n- Expresses wishes about understanding ("I wish I could figure out...", "I'd love to know...")\n- Shows skepticism ("Did this actually...", "Does this really...")\n- Makes rhetorical questions that imply investigation ("What's causing...", "Why is this...")\n- Doubts explanations or documentation they're reading

thinkSlash Command

Control thinking token limits via environment variable

validate-moduleSlash Command

Run complete two-agent validation on module+tests (contract extraction + test validation). Binary pass/fail with specific issues.

Code Consistency - Logging & StandardsSkill

Check Python logging levels and patterns for correctness. Focus on identifying wrong severity levels and missing exception handling. Use when reviewing code quality.

contextvar-opportunity-finderSkill

Detect explicit user_id parameters in functions to identify potential opportunities for using ambient context. This is an investigation tool that flags instances for human review, not a prescriptive analyzer.

contextvar-remediationSkill
fail-fast-no-hedgingSkill

Eliminate component hedging anti-patterns that mask infrastructure failures. Build systems that fail loudly when broken instead of limping along in degraded states. Critical for production reliability and operational visibility.