Skip to main content
ClaudeWave
Skill444 estrellas del repoactualizado 4d ago

integration-e2e-testing

This skill provides structured guidance for designing integration and end-to-end tests across three test types: unit-scoped integration tests, fixture-driven E2E tests with mocked backends, and service-integration E2E tests against local stacks. Use it when planning test coverage for a feature, selecting between test lanes, calculating ROI for each test, or reviewing whether a test targets user-observable behavior rather than implementation details.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/shinpr/claude-code-workflows /tmp/integration-e2e-testing && cp -r /tmp/integration-e2e-testing/dev-workflows-fullstack/skills/integration-e2e-testing ~/.claude/skills/integration-e2e-testing
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Integration and E2E Testing Principles

## References

**E2E test design**: See [references/e2e-design.md](references/e2e-design.md) for UI Spec-driven E2E test candidate selection and browser test architecture. The reference uses Playwright as the default browser harness; substitute the project's standard when different.

## Test Type Definition and Limits

| Test Type | Purpose | Scope | External Deps | Limit per Feature | Implementation Timing |
|-----------|---------|-------|---------------|-------------------|----------------------|
| Integration | Verify component interactions in-process | Partial system integration (in-process modules; for UI components, the framework's in-process renderer e.g., RTL+MSW for React/TS) | Mocked or in-process | MAX 3 | Created alongside implementation |
| fixture-e2e | Verify UI behavior in a browser with deterministic fixtures | Full UI flow with mocked backend / fixture-driven state | Mocked / fixture only — no live services | MAX 3 | Created alongside the UI feature |
| service-integration-e2e | Verify critical user journeys against a running local stack | Full system across services | Live local services or stubs | MAX 1-2 | Executed only in the final phase |

**Lane selection (E2E only)**:
- Default lane for user-facing UI journeys is **fixture-e2e** — it runs a real browser against deterministic fixtures, catches the bugs that unit/integration tests miss (button no-op, state never updates, navigation breaks), and runs in CI without infrastructure setup
- Add **service-integration-e2e** only when the journey's correctness depends on real cross-service behavior (data persistence, transactional consistency, external service contracts) that cannot be faked safely

The two E2E lanes are budgeted independently — having a fixture-e2e for a journey does not consume the service-integration-e2e budget and vice versa.

## Behavior-First Principle

### Include (High ROI)
- Business logic correctness (calculations, state transitions, data transformations)
- Data integrity and persistence behavior
- User-visible functionality completeness
- Error handling behavior (what user sees/experiences)

### Redirect to Other Test Types
- External service connections → Verify via contract/interface tests
- Performance metrics → Verify via dedicated load testing
- Implementation details → Verify observable behavior instead
- UI layout specifics → Verify information availability instead

**Principle**: Test = User-observable behavior verifiable in isolated CI environment

## ROI Calculation

ROI is used to **rank candidates within the same test type** (integration candidates against each other, E2E candidates against each other). Cross-type comparison is unnecessary because integration and E2E budgets are selected independently.

```
ROI Score = Business Value × User Frequency + Legal Requirement × 10 + Defect Detection
              (range: 0–120)
```

Higher ROI Score = higher priority within its test type. No normalization or capping is applied — the raw score is used directly for ranking. Deduplication is a separate step that removes candidates entirely; it does not modify scores.

### ROI Thresholds by Lane

The two E2E lanes have very different ownership costs and use independent thresholds.

| Lane | ROI threshold | Rationale |
|------|---------------|-----------|
| fixture-e2e | ROI ≥ 20 (beyond reserved slot) | Cost is comparable to integration tests once the harness exists; the floor avoids filling MAX 3 with low-signal tests when fewer would suffice |
| service-integration-e2e | ROI > 50 (beyond reserved slot) | Creation, execution, and maintenance cost is 3-10× higher than integration; reserve for journeys whose value cannot be proven any other way |

Reserved slot rules (see Multi-Step User Journey Definition below) apply per lane and override the threshold (the reserved candidate is emitted regardless of its ROI score). Below-floor candidates beyond the reserved slot are not emitted, leaving budget intentionally unfilled rather than padding with low-value tests.

### ROI Calculation Examples

| Scenario | BV | Freq | Legal | Defect | ROI Score | Test Type | Selection Outcome |
|----------|----|------|-------|--------|-----------|-----------|-------------------|
| Core checkout UI flow | 10 | 9 | true | 9 | 109 | fixture-e2e | Selected (reserved slot: user-facing multi-step journey, browser-level verification with fixtures) |
| Core checkout against live payment service | 10 | 9 | true | 9 | 109 | service-integration-e2e | Selected (real-service correctness above ROI threshold) |
| Dismiss button updates UI state | 6 | 7 | false | 8 | 50 | fixture-e2e | Selected (rank 2 of 3 fixture-e2e budget) |
| Payment error message display | 5 | 4 | false | 7 | 27 | fixture-e2e | Selected (rank 3 of 3 fixture-e2e budget) |
| Optional filter toggle | 3 | 4 | false | 2 | 14 | fixture-e2e | Not selected (rank 4, budget full) |
| Payment retry against real provider | 8 | 3 | false | 7 | 31 | service-integration-e2e | Below ROI threshold (31 < 50), not selected |
| DB persistence check | 8 | 8 | false | 8 | 72 | Integration | Selected (rank 1 of 3) |
| Pure data transformation | 5 | 3 | false | 4 | 19 | Integration | Selected (rank 2 of 3) |

## Multi-Step User Journey Definition

A feature qualifies as containing a **multi-step user journey** when ALL of the following are true:

1. **2+ distinct interaction boundaries** are traversed in sequence to complete a user goal. What counts as a boundary depends on the system type:
   - Web: distinct routes/pages
   - Mobile native: distinct screens/views
   - CLI: distinct command invocations or interactive prompts
   - API: distinct API calls forming a transaction (e.g., create → confirm → finalize)
2. **State carries across steps** — data produced or actions taken in one step affect what the next step accepts or displays
3. **The journey has a completion point** — a final state the user or caller reaches (e.g., confirmation pa
acceptance-test-generatorSubagent

Generates integration/E2E test skeletons from Design Doc ACs using ROI-based selection and journey-based E2E reservation. Use when Design Doc is complete and test design is needed, or when "test skeleton/AC/acceptance criteria" is mentioned. Behavior-first approach for minimal tests with maximum coverage.

code-reviewerSubagent

Validates Design Doc compliance and implementation completeness from third-party perspective. Use PROACTIVELY after implementation completes or when "review/implementation check/compliance" is mentioned. Provides acceptance criteria validation and quality reports.

code-verifierSubagent

Validates consistency between PRD/Design Doc and code implementation. Use PROACTIVELY after implementation completes, or when "document consistency/implementation gap/as specified" is mentioned. Uses multi-source evidence matching to identify discrepancies.

codebase-analyzerSubagent

Analyzes existing codebase objectively for facts about implementation, user behavior patterns, and technical architecture. Use when existing code needs to be understood without hypothesis bias. Invoked before Design Doc creation to produce focused guidance for technical designers.

design-syncSubagent

Detects conflicts across multiple Design Docs and provides structured reports. Use when multiple Design Docs exist, or when "consistency/conflict/sync/between documents" is mentioned. Focuses on detection and reporting only, no modifications.

document-reviewerSubagent

Reviews document consistency and completeness, providing approval decisions. Use PROACTIVELY after PRD/UI Spec/Design Doc/work plan creation, or when "document review/approval/check" is mentioned. Detects contradictions and rule violations with improvement suggestions.

integration-test-reviewerSubagent

Verifies consistency between test skeleton comments and implementation code. Use PROACTIVELY after test implementation completes, or when "test review/skeleton verification" is mentioned. Returns quality reports with failing items and fix instructions.

investigatorSubagent

Comprehensively collects problem-related information and creates evidence matrix. Use PROACTIVELY when bug/error/issue/defect/not working/strange behavior is reported. Reports only observations without proposing solutions.