Skip to main content
ClaudeWave
Skill292 estrellas del repoactualizado 2d ago

tool-design-sprint-test-and-score

tool-design-sprint-test-and-score produces a Friday Design Sprint closing artifact synthesizing live observations from five target-profile customer interviews testing Thursday's prototype, including per-customer notes, best quotes, a scorecard grid scoring the sprint's research questions, observed patterns, team member hot takes, and the Decider's build/iterate/pivot/stop decision with supporting rationale. Use this skill when the prototype passed Thursday's trial run, five confirmed participants are scheduled, the team can observe interviews live, and the Decider is present Friday afternoon for the post-interview review.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/product-on-purpose/pm-skills /tmp/tool-design-sprint-test-and-score && cp -r /tmp/tool-design-sprint-test-and-score/skills/tool-design-sprint-test-and-score ~/.claude/skills/tool-design-sprint-test-and-score
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

<!-- PM-Skills | https://github.com/product-on-purpose/pm-skills | Apache 2.0 -->

# Design Sprint Test and Score (Friday)

Friday is the sprint's payoff. 5 target-profile customers run the prototype while the team observes; the team synthesizes observations into a scorecard against the sprint questions; the Decider makes the build / iterate / pivot / stop call by end-of-day. The week's 35-40 person-days plus customer recruiting cost converts into one actionable decision.

Family contract: [`docs/reference/skill-families/design-sprint-skills-contract.md`](../../docs/reference/skill-families/design-sprint-skills-contract.md). This skill is a member of `design-sprint-skills`.

## When to Use

- It is Day 5 of the Design Sprint and Thursday's prototype passed trial run.
- 5 confirmed participants are scheduled (canonical; or 4 if 1 cancelled-and-no-buffer; pause if below 4).
- The team can observe interviews live (in-person or via Zoom breakout room) and synthesize during the day.
- The Decider is present Friday PM for the post-interview review (canonically 14:00-18:00 PT window covering observation of slots 4-5 plus Decider review by 17:30 PT).

## When NOT to Use

- Thursday prototype did not pass trial run. Re-run trial; if still failing at 19:00 PT Thursday, postpone Friday.
- Fewer than 3 customers confirmed. Per Ratified Decision 3, the canonical guidance is 5 customers; 3-4 or 6-7 gets a documented warning; below 3 or above 7 should trigger a re-decision (postpone or split testing). Note: the v0.1.0 family validator does NOT mechanically enforce these thresholds (cohort count is in the EXAMPLE artifact, not in frontmatter); enforcement is a v2.16 validator-expansion candidate.
- Decider unavailable for the post-interview review window. Without Decider, the day produces observations without a call.
- The team plans to use this skill to write the executive memo. Per Ratified Decision 4: exec memo authoring is delegated to `foundation-stakeholder-update` (existing pm-skills foundation skill); this skill produces the Decider summary only.

## What This Skill Produces

A single bundled artifact with six sections:

1. **Per-customer interview observation notes**: one section per customer; covers Context (Act 2) reactions, Tasks (Act 4) behavior with timestamps, Debrief (Act 5) reactions including pricing. Captured live during the day's interviews.
2. **Best quotes**: 5-15 verbatim customer quotes the team flags as most signal-bearing. Used in the Decider summary and in any downstream pitch or planning artifact.
3. **Scorecard grid**: rows are the sprint questions (from Monday); columns are the 5 customers; each cell is Y / N / partial / unclear with a one-line note; rightmost column is the team's day-end decision per question (Validated / Invalidated / Inconclusive).
4. **Observed patterns**: 4 buckets (worked, hesitated, broke trust, unexpected) with 2-4 patterns per bucket. Each pattern names how many customers showed it.
5. **Hot takes**: one short paragraph per team member capturing their personal read on Friday before group synthesis biases the read. Written silently in parallel.
6. **Decider summary**: the Decider's call (build / iterate / pivot / stop / reframe) plus the highest-confidence learning, the most important revision the team would make to the prototype direction, and the next artifact the team will produce (the post-sprint deliverable).

See `references/TEMPLATE.md` for the canonical structure and `references/EXAMPLE.md` for the Brainshelf book-catalog Friday artifact.

## Friday Time Structure

Friday is the longest day: customer interviews start early (canonically 09:00 PT) and the Decider review concludes the day (canonically 17:30 PT).

- **09:00-16:30**: 5 customer interviews of 50-60 minutes each at 09:00 / 10:30 / 12:00 / 14:00 / 15:30. Each slot: 10 min setup + 50-55 min interview + 5 min team huddle to capture observations before next customer.
- **13:00-14:00**: Lunch (slot 3 wraps ~13:00; lunch overlaps the slot 3 to slot 4 buffer)
- **16:30-16:45**: Last-customer wrap; observation note tidy
- **16:45-17:00**: Team writes hot takes silently in parallel
- **17:00-17:30**: Decider reviews scorecard + hot takes; makes the call
- **17:30-18:00**: Decider summary captured; team begins post-sprint disposition (next-step calendar, downstream deliverable assignment)

This skill's 270-minute timebox covers the synthesis sections (scorecard, patterns, hot takes, Decider summary). The 5 interviews themselves (~5 hours of interview time) run in parallel with continuous observation capture.

## Scorecard Mechanic

The scorecard is a 2-D grid. Rows are sprint questions from Monday's map-and-target (typically 3-7). Columns are the 5 customers (anonymized IDs). Each cell answers: did this customer's interview validate, invalidate, or leave inconclusive the row's question?

| | C1 | C2 | C3 | C4 | C5 | Day-end decision |
|---|---|---|---|---|---|---|
| Q1 | Y | Y | N | Y | partial | Validated (4 of 5) |
| Q2 | N | Y | unclear | N | N | Invalidated (3-of-5 N, 1 of 5 Y) |
| ... | ... | ... | ... | ... | ... | ... |

Day-end decision rules:
- **Validated**: 4 or 5 of 5 Y (strong signal); 3 of 5 Y with no N (directional). For 4-customer cohorts: 4 Y is Validated; 3 Y with no N is directional.
- **Invalidated**: 4 or 5 of 5 N. For 4-customer cohorts: 4 N is Invalidated; 3 N with no Y is directional.
- **Inconclusive**: all other patterns. Inconclusive questions get scheduled for follow-up (a smaller test, a quant experiment, or a second Design Sprint).

The Decider can override day-end decisions but should record reasoning.

## Common Pitfalls

- **Observation notes too narrative, not behavioral.** "Customer seemed confused" is a narrative; "Customer hovered on the capture button for 4 seconds without tapping, then tapped twice in rapid succession" is behavior. Behavior is data; narrative is interpretation.
- **Scorecard cells filled in by consensus.** Each observer writes their ce