Skill24.5k repo starsupdated 25d ago

strategy-red-team

This skill attacks the load-bearing assumptions in product roadmaps, strategies, and plans by identifying which claims would cause failure if false, then ranking them by impact, likelihood of being wrong, and cheapness to test. Use it to stress-test a plan before committing resources, pressure-test strategy documents, challenge unstated assumptions, or prepare materials for executive scrutiny when higher-stakes decisions demand evidence of resilience.

View source Repository: pm-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/phuryn/pm-skills /tmp/strategy-red-team && cp -r /tmp/strategy-red-team/pm-execution/skills/strategy-red-team ~/.claude/skills/strategy-red-team

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Strategy Red-Team: Attack the Assumptions Before Reality Does

## Purpose

You are a sharp, fair adversary reviewing $ARGUMENTS. Most plans only survived polite feedback. This skill finds the load-bearing assumptions that would make the plan fail, attacks them honestly, and returns — for each — the evidence to get this week, the kill criteria, and the cheapest test.

## Context

A red-team is not a pre-mortem. A pre-mortem imagines the plan already failed and narrates why. A red-team attacks the load-bearing assumptions and logic **now**, while there's still time to test the cheapest one. It improves judgment, not just confidence.

The goal is a sharper decision, not a longer risk list. Five real kill-assumptions with tests beat twenty generic risks.

## Instructions

1. **Extract every claim.** Read the plan and list what it asserts as true — about the user, the market, the constraint, the mechanism, the timeline. Separate **load-bearing** claims (if false, the plan dies) from cosmetic ones. Only load-bearing claims are worth attacking.

2. **Steelman, then attack.** For each load-bearing claim, first state the strongest version of why it might be true. Then attack *that* — not a strawman. An attack on a weak version of the claim is worthless.

3. **Write each failure mode as "Fails if ___."** Be concrete and falsifiable. "Fails if activation isn't actually the constraint" beats "execution risk."

4. **Rank by (impact if wrong) × (likelihood wrong) × (cheapness to test).** The top of the list is what to test *this week* — high-impact, plausibly wrong, and cheap to check. Surface that ranking; don't bury the lede.

5. **Self-refute, don't fabricate.** Default to "this risk is real" unless the plan already cites evidence against it. But if a claim is genuinely well-reasoned, say so plainly — a red-team that manufactures doubt is as useless as one that rubber-stamps. Never invent a weakness the plan doesn't have.

6. **For each surviving kill-assumption, give the operator something to do:**
- **Fails if:** the precise condition that breaks the plan
- **Evidence to get this week:** the specific data, query, or conversation that would confirm or kill it cheaply
- **Kill criterion:** the threshold at which you'd stop or change course
- **Cheapest test:** the smallest experiment that moves the belief

7. **Optional cross-model mode.** If the user asks for a second opinion and another model (Codex, Gemini, a second Claude) is reachable, run the same plan through it and flag where the two disagree — different model families miss different things. Default is single-model; don't add this friction unless asked.

8. **Structure the output (make it screenshot-native):**

```
## Red-Team: [plan in one line]

### Top Kill-Assumptions (ranked)
For each (3–5 max):
- **Claim:** [the load-bearing assertion]
- **Fails if:** [concrete, falsifiable condition]
- **Evidence to get this week:** [specific]
- **Kill criterion:** [threshold]
- **Cheapest test:** [smallest experiment]

### What's Well-Reasoned
[State explicitly what holds up — and why. Don't manufacture doubt.]

### What I Couldn't Assess
[Gaps where the plan didn't give enough to judge.]
```

## Notes

- No strawmanning — attack the steelman or don't attack.
- No generic risk lists — every item must be specific to *this* plan.
- No fabrication — if it's sound, say so.
- Rank ruthlessly — the cheapest high-impact test is the whole point.
- The emotional job is relief from the fear of confidently shipping the wrong bet, so end with what to *do*, not just what to fear.

---

### Further Reading

- [Assumption Prioritization Canvas: How to Identify And Test The Right Assumptions](https://www.productcompass.pm/p/assumption-prioritization-canvas)
- [How to Manage Risks as a Product Manager](https://www.productcompass.pm/p/how-to-manage-risks-as-a-product-manager)
- [How Meta and Instagram Use Pre-Mortems to Avoid Post-Mortems](https://www.productcompass.pm/p/how-to-run-pre-mortem-template)

More from this repository

intended-vs-implementedSkill

The method for finding the gap between what a system is supposed to do and what the code actually does — the class of bug generic scanners miss because they have no model of intent. Defines what counts as documented intent, what counts as implementation evidence, which mismatches matter, and how to avoid hand-wavy findings. Use when auditing AI-built code, reviewing access control against documented permissions, or checking whether a codebase matches its own documentation.

shipping-artifactsSkill

The durable documentation set that makes an AI-built (vibe-coded) app reviewable before shipping. A small core every app needs — architecture, user/permission flows, permissions, variables/secrets, and a test-coverage map — plus conditional docs added only when they apply: emails, scheduled work, SEO, and embedded agents/automation. Defines what each doc must capture and how a reviewer or auditor uses it. Use when documenting a codebase for handoff, mapping user journeys and trust-boundary crossings, planning test coverage, or preparing for a security or performance audit.

ab-test-analysisSkill

Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.

cohort-analysisSkill

Perform cohort analysis on user engagement data — retention curves, feature adoption trends, and segment-level insights. Use when analyzing user retention by cohort, studying feature adoption over time, investigating churn patterns, or identifying engagement trends.

sql-queriesSkill

Generate SQL queries from natural language descriptions. Supports BigQuery, PostgreSQL, MySQL, and other dialects. Reads database schemas from uploaded diagrams or documentation. Use when writing SQL, building data reports, exploring databases, or translating business questions into queries.

brainstorm-okrsSkill

Brainstorm team-level OKRs aligned with company objectives — qualitative objectives with measurable key results. Use when setting quarterly OKRs, aligning team goals with company strategy, drafting objectives, or learning how to write effective OKRs.

create-prdSkill

Create a Product Requirements Document using a comprehensive 8-section template covering problem, objectives, segments, value propositions, solution, and release planning. Use when writing a PRD, documenting product requirements, preparing a feature spec, or reviewing an existing PRD.

dummy-datasetSkill

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Use when creating test data, building mock datasets, or generating sample data for development and demos.