Skill208 repo starsupdated today

product-analytics

Product Analytics provides frameworks for converting raw product data into ship/extend/kill decisions through A/B test evaluation, cohort retention analysis, funnel analysis, and statistical foundations. Use this skill when comparing experiment variants, measuring feature adoption rates, diagnosing conversion drop-offs, sizing samples for statistical power, or determining whether product changes warrant shipping based on confidence intervals and significance testing.

View source Repository: orchestkit

Install in Claude Code

Copy

git clone --depth 1 https://github.com/yonatangross/orchestkit /tmp/product-analytics && cp -r /tmp/product-analytics/plugins/ork/skills/product-analytics ~/.claude/skills/product-analytics

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Product Analytics

Frameworks for turning raw product data into ship/extend/kill decisions. Covers A/B testing, cohort retention, funnel analysis, and the statistical foundations needed to make those decisions with confidence.

## Quick Reference

| Category | Rules | Impact | When to Use |
|----------|-------|--------|-------------|
| [A/B Test Evaluation](#ab-test-evaluation) | 1 | HIGH | Comparing variants, measuring significance, shipping decisions |
| [Cohort Retention](#cohort-retention) | 1 | HIGH | Feature adoption curves, day-N retention, engagement scoring |
| [Funnel Analysis](#funnel-analysis) | 1 | HIGH | Drop-off diagnosis, conversion optimization, stage mapping |
| [Statistical Foundations](#statistical-foundations) | 1 | HIGH | p-value interpretation, sample sizing, confidence intervals |

**Total: 4 rules across 4 categories**

## A/B Test Evaluation

Load `rules/ab-test-evaluation.md` for the full framework. Quick pattern:

```markdown
## Experiment: [Name]

Hypothesis: If we [change], then [primary metric] will [direction] by [amount]
  because [evidence or reasoning].

Sample size: [N per variant] — calculated for MDE=[X%], power=80%, alpha=0.05
Duration: [Minimum weeks] — never stop early (peeking bias)

Results:
  Control:   [metric value]  n=[count]
  Treatment: [metric value]  n=[count]
  Lift:      [+/- X%]        p=[value]  95% CI: [lower, upper]

Decision: SHIP / EXTEND / KILL
  Rationale: [One sentence grounded in numbers, not gut feel]
```

**Decision rules:**
- **SHIP** — p < 0.05, CI excludes zero, no guardrail regressions
- **EXTEND** — trending positive but underpowered (add runtime, not reanalysis)
- **KILL** — null result or guardrail degradation

See `rules/ab-test-evaluation.md` for sample size formulas, SRM checks, and pitfall list.

## Cohort Retention

Load `rules/cohort-retention.md` for full methodology. Quick pattern:

```sql
-- Day-N retention cohort query
SELECT
  DATE_TRUNC('week', first_seen)  AS cohort_week,
  COUNT(DISTINCT user_id)         AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN activity_date = first_seen + INTERVAL '7 days'
    THEN user_id END) * 100.0
    / COUNT(DISTINCT user_id)     AS day_7_retention
FROM user_activity
GROUP BY 1
ORDER BY 1;
```

**Retention benchmarks (SaaS):**
- Day 1: 40–60% is healthy
- Day 7: 20–35% is healthy
- Day 30: 10–20% is healthy
- Flat curve after day 30 = product-market fit signal

See `rules/cohort-retention.md` for behavior-based cohorts, feature adoption curves, and engagement scoring.

## Funnel Analysis

Load `rules/funnel-analysis.md` for full methodology. Quick pattern:

```markdown
## Funnel: [Name] — [Date Range]

Stage 1: [Aware / Land]     → [N] users    (entry)
Stage 2: [Activate / Sign]  → [N] users    ([X]% from stage 1)
Stage 3: [Engage / Use]     → [N] users    ([X]% from stage 2)  ← biggest drop
Stage 4: [Convert / Pay]    → [N] users    ([X]% from stage 3)

Overall conversion: [X]%
Biggest drop-off:  Stage 2→3 ([X]% loss) — investigate first
```

**Optimization order:** Fix the largest drop-off first. A 5-point improvement at a high-volume step is worth more than a 20-point improvement at a low-volume step.

See `rules/funnel-analysis.md` for segmented funnels, micro-conversion tracking, and prioritization patterns.

## Statistical Foundations

Plain-English explanations of the stats every PM needs. Load `references/stats-cheat-sheet.md` for formulas and quick lookups.

**p-value in plain English:** The probability that you would see a result this extreme (or more extreme) if the change had zero effect. p=0.03 means a 3% chance you're looking at random noise. It does NOT mean "97% probability the change works."

**Confidence interval in plain English:** The range where the true effect probably lives. "Lift = +8%, 95% CI [+2%, +14%]" means you are fairly confident the real lift is somewhere between 2% and 14%. If the CI includes zero, you cannot claim a win.

**Minimum Detectable Effect (MDE):** The smallest lift you care about detecting. Setting MDE too small forces impractically large sample sizes. Anchor MDE to business value — if a 2% lift is not worth shipping, set MDE = 5%.

**Statistical vs practical significance:** A result can be statistically significant (p < 0.05) but practically meaningless (lift = 0.01%). Always check both. A 0.01% lift that costs 6 weeks of eng time is not a win.

## Common Pitfalls

1. **Peeking** — stopping an experiment early because results look good inflates false-positive rate. Commit to a runtime before launch.
2. **Multiple comparisons** — testing 10 metrics at p < 0.05 means ~1 false positive by chance. Apply Bonferroni correction or pre-register your primary metric.
3. **Sample Ratio Mismatch (SRM)** — if variant group sizes differ from expected split by > 1%, your experiment is broken. Fix before analyzing results.
4. **Novelty effect** — new features get inflated engagement in week 1. Run experiments long enough to see settled behavior (minimum 2 full business cycles).
5. **Simpson's paradox** — aggregate results can reverse when segmented. Always check results by key segments (device, plan tier, geography).

## Ship / Extend / Kill Framework

| Signal | Decision | Action |
|--------|----------|--------|
| p < 0.05, CI excludes zero, guardrails green | SHIP | Full rollout, update success metrics |
| Positive trend, underpowered (p = 0.10–0.15) | EXTEND | Add runtime, do not peek again |
| p > 0.15, flat or negative | KILL | Revert, document learnings, re-hypothesize |
| Guardrail regression, any p-value | KILL | Immediate revert regardless of primary metric |
| SRM detected | INVALID | Fix assignment bug, restart experiment |

## Related Skills

- `ork:product-frameworks` — OKRs, KPI trees, RICE prioritization, PRD templates
- `ork:metrics-instrumentation` — Event naming, metric definition, alerting setup
- `ork:brainstorm` — Generate hypotheses and experiment ideas
- `ork:assess` — Evaluate product quality and risks

## References