nw-ad-critique-dimensions
nw-ad-critique-dimensions is a Claude Code skill that provides a structured peer review framework for acceptance tests across nine quality dimensions including happy path bias detection, Given-When-Then compliance, business language purity, coverage completeness, walking skeleton user-centricity, priority validation, observable behavior assertions, traceability coverage, and boundary proof validation. Use this during acceptance test handoff reviews to ensure tests are behavior-driven, implementation-agnostic, stakeholder-understandable, and comprehensively cover functional requirements and edge cases.
git clone --depth 1 https://github.com/nWave-ai/nWave /tmp/nw-ad-critique-dimensions && cp -r /tmp/nw-ad-critique-dimensions/nWave/skills/nw-ad-critique-dimensions ~/.claude/skills/nw-ad-critique-dimensionsSKILL.md
# Acceptance Test Critique Dimensions
Load when performing peer review of acceptance tests (during *handoff-develop).
## Dimension 1: Happy Path Bias
**Pattern**: Only successful scenarios, error paths missing.
Detection: Count success vs error scenarios. Error should be at least 40%. Missing coverage examples: login success but no invalid password | Payment processed but no decline/timeout | Search results but no empty/error cases.
Severity: blocker (production error handling untested).
## Dimension 2: GWT Format Compliance
**Pattern**: Scenarios violate Given-When-Then structure.
Violations: Missing Given context | Multiple When actions (split into separate scenarios) | Then with technical assertions instead of business outcomes. Each scenario: Given (context), When (single action), Then (observable outcome).
Severity: high (tests not behavior-driven).
## Dimension 3: Business Language Purity
**Pattern**: Technical terms leak into acceptance tests.
Flag: database, API, HTTP, REST, JSON, classes, methods, services, controllers, status codes (500, 404), infrastructure (Redis, Kafka, Lambda).
Business alternatives: "Customer data is stored" not "Database persists record" | "Order is confirmed" not "API returns 200 OK" | "Payment fails" not "Gateway throws exception"
Severity: high (tests coupled to implementation).
## Dimension 4: Coverage Completeness
**Pattern**: User stories lack acceptance test coverage.
Validation: Map each story to scenarios | Verify all AC have corresponding tests | Confirm edge cases and boundaries tested.
Severity: blocker (unverified requirements).
## Dimension 5: Walking Skeleton User-Centricity
**Pattern**: Walking skeletons describe technical layer connectivity instead of user value.
Detection litmus test for `@walking_skeleton` scenarios:
- Title describes user goal or technical flow?
- Then steps describe user observations or internal side effects?
- Could non-technical stakeholder confirm "yes, that is what users need"?
Violations: "End-to-end order flow through all layers" (technical framing) | Then "order row inserted in database" (internal side effects) | Given "database contains user record" instead of "customer has an account"
Severity: high (skeletons that only prove wiring miss the point -- first skeleton should be demo-able to stakeholder).
## Dimension 6: Priority Validation
**Pattern**: Tests address secondary concerns while larger gaps exist.
Questions: 1. Is this the largest bottleneck? (timing data or gap analysis) | 2. Simpler alternatives considered? | 3. Constraint prioritization correct? | 4. Test design decisions data-justified?
Severity: blocker if wrong problem addressed, high if no measurement data.
## Dimension 7: Observable Behavior Assertions
**Pattern**: Tests assert internal state or method calls instead of observable behavior.
For EVERY Then step in EVERY scenario, apply this mechanical checklist:
1. Does the assertion check a return value from a driving port call? YES = pass, NO = flag.
2. Does the assertion check an observable outcome (user sees X, system produces Y)? YES = pass, NO = flag.
3. Does the assertion check internal state, private fields, or method call counts? YES = REJECT the scenario.
**Concrete violations to flag**:
- `assert mock_repo.save.called` — asserts method call, not observable outcome
- `assert len(db.query(Order).all()) == 1` — asserts internal DB state
- `assert obj._internal_field == "value"` — asserts private state
- `assert os.path.exists("output.json")` — asserts file existence (implementation detail)
**Concrete passing assertions**:
- `assert result.is_confirmed()` — observable business outcome
- `assert result.order_number is not None` — return value from driving port
- `assert "confirmation" in customer_notification.subject` — observable user outcome
**Relationship to Dim 5 (Walking Skeleton User-Centricity)**:
- Dim 5 validates walking skeleton SCOPE (user goal framing vs technical layer framing)
- Dim 7 validates ASSERTION TYPE for ALL scenarios (walking skeletons AND focused scenarios)
- A scenario can pass Dim 5 (good user-centric framing) and fail Dim 7 (internal state assertions)
Severity: high (tests coupled to implementation break on refactoring).
## Dimension 8: Traceability Coverage
**Pattern**: Scenarios exist without traceability to upstream wave artifacts.
Two mandatory traceability checks:
**Check A — Story-to-Scenario mapping**:
1. Read `docs/feature/{feature-id}/discuss/user-stories.md`
2. Extract ALL story IDs (e.g., US-01, US-02)
3. For EACH story ID, verify at least one scenario references it (via tag or comment)
4. Flag EVERY story ID with zero matching scenarios as BLOCKER
**Check B — Environment-to-Scenario mapping**:
1. Read `docs/feature/{feature-id}/devops/environments.yaml`
2. If missing, use defaults: `clean`, `with-pre-commit`, `with-stale-config`
3. For EACH environment, verify at least one walking skeleton includes a Given clause referencing that environment's preconditions
4. Flag EVERY environment with zero matching Given clauses as HIGH
**What this dimension does NOT cover**:
- KPI measurability — that is PO-reviewer scope during DELIVER post-merge gate
- Scenario quality — covered by Dims 1-7
Severity: blocker for Check A (untraceable requirements), high for Check B (untested environments).
## Review Output Format
```yaml
review_id: "accept_rev_{timestamp}"
reviewer: "acceptance-designer (review mode)"
strengths:
- "{positive test design aspect with example}"
issues_identified:
happy_path_bias:
- issue: "Feature {name} only tests success"
severity: "blocker"
recommendation: "Add error scenarios: invalid input, timeout, service failure"
gwt_format:
- issue: "Scenario has multiple When actions"
severity: "high"
recommendation: "Split into separate scenarios"
business_language:
- issue: "Technical term '{term}' in scenario"
severity: "high"
recommendation: "ReplReview dimensions for validating agent quality - template compliance, safety, testing, and priority validation
Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation
Detailed 5-phase workflow for creating agents - from requirements analysis through validation and iterative refinement
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Architectural style selection decision matrices, trade-off analysis, structural enforcement rules, and combination patterns. Load when choosing or evaluating architecture styles.
Comprehensive architecture patterns, methodologies, quality frameworks, and evaluation methods for solution architects. Load when designing system architecture or selecting patterns.
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
Domain-specific authoritative source databases, search strategies by topic category, and source freshness rules