Skill223 repo starsupdated yesterday

calc-sample-size

This skill guides medical researchers through sample size and power calculations by interactively walking them through a decision tree to select the appropriate statistical test, then generating reproducible R and Python code with effect size interpretations and IRB-ready justification text. Use it during study design when planning prospective trials, retrospective cohorts, or diagnostic studies requiring formal sample size justification.

View source Repository: medsci-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Aperivue/medsci-skills /tmp/calc-sample-size && cp -r /tmp/calc-sample-size/skills/calc-sample-size ~/.claude/skills/calc-sample-size

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Calc-Sample-Size Skill

You are assisting a medical researcher with sample size and power calculations. Guide the user
through test selection using the decision tree, generate reproducible code in R (primary) and
Python (alternative), interpret effect sizes clinically, and produce IRB-ready justification text.

## Reference Files

- **Formulas**: `${CLAUDE_SKILL_DIR}/references/formulas.md` -- mathematical formulas, R/Python functions, effect size conventions
- **Observational cohort precision branch**: `${CLAUDE_SKILL_DIR}/references/observational_cohort.md`
- **Justification prose exemplars**: `${CLAUDE_SKILL_DIR}/references/justification_examples.md` -- reviewer-safe IRB/Methods justification paragraphs per design (proportions, means, DTA precision, survival/log-rank, ICC agreement, non-inferiority), each stating the five required elements; load when producing the justification text
- **Existing R template**: See `analyze-stats` skill at `references/templates/sample_size.R` for the 7 original tests

Read `formulas.md` before generating calculation code.
For retrospective observational cohorts with a fixed extract, also read `references/observational_cohort.md` and report event budget / confidence-interval precision instead of forcing a prospective recruitment-style power calculation.

## Cross-Skill References

- **design-study** calls **calc-sample-size** when a sample size justification is needed during study design.
- **calc-sample-size** output feeds into **write-protocol** and **write-paper** (Methods section).
- Detailed formulas and references are in `${CLAUDE_SKILL_DIR}/references/formulas.md`.

---

## Decision Tree

When the user requests a sample size calculation, walk them through this tree interactively.
Ask one question at a time. Do not assume answers.

```
What is your primary outcome?
|
+-- Binary (yes/no, positive/negative)
|   |
|   +-- Paired data (same subjects, two methods)?
|   |   +-- YES --> [5] McNemar test
|   |   +-- NO  --> How many groups?
|   |       +-- 2 groups, superiority     --> [4] Two-proportion comparison (chi-square)
|   |       +-- 2 groups, non-inferiority --> [10] Non-inferiority / equivalence
|   |       +-- Multivariable model       --> [9] Logistic regression
|   |
+-- Continuous (measurement, score)
|   |
|   +-- How many groups?
|       +-- 2 groups  --> [6] Independent t-test
|       +-- 3+ groups --> [8] One-way ANOVA
|
+-- Time-to-event (survival, recurrence)
|   |
|   +-- Two groups, unadjusted      --> [7] Log-rank test
|   +-- Multivariable / adjusted HR  --> [7] Log-rank (Schoenfeld) + [11] Cox EPV
|
+-- Agreement (inter-rater, reproducibility)
|   |
|   +-- Continuous measurements --> [2] ICC
|   +-- Categorical ratings     --> [3] Kappa
|
+-- Diagnostic accuracy (Se, Sp, AUC precision)
    |
    +--> [1] Diagnostic accuracy (precision-based)
```

---

## Supported Tests

### Test 1: Diagnostic Accuracy (Sensitivity/Specificity Precision)

**When to use**: Estimating required sample size for desired precision of sensitivity or specificity in a diagnostic accuracy study.

**Required parameters** (ask the user):
| Parameter | Description | Default |
|-----------|-------------|---------|
| `sensitivity_expected` | Expected sensitivity | 0.85 |
| `ci_half_width` | Desired half-width of 95% CI | 0.05 |
| `prevalence` | Disease prevalence in study population | 0.30 |
| `alpha` | Significance level | 0.05 |
| `attrition_rate` | Expected dropout/exclusion rate | 0.15 |

**Effect size interpretation**: The CI half-width determines precision. A half-width of 0.05 means the 95% CI for sensitivity will be within +/-5 percentage points. Narrower CIs require larger samples.

---

### Test 2: ICC Agreement (Bonett 2002)

**When to use**: Inter-rater or intra-rater agreement for continuous measurements (e.g., tumor size, angle measurement).

**Required parameters**:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `icc_expected` | Expected ICC value | 0.75 |
| `icc_null` | Null hypothesis ICC (lower bound) | 0.50 |
| `n_raters` | Number of raters | 2 |
| `alpha` | Significance level | 0.05 |
| `power` | Desired power | 0.80 |
| `attrition_rate` | Expected dropout rate | 0.10 |

**Effect size interpretation**: ICC < 0.50 = poor, 0.50-0.75 = moderate, 0.75-0.90 = good, > 0.90 = excellent (Koo & Li, 2016).

---

### Test 3: Kappa Agreement (Donner & Eliasziw 1992)

**When to use**: Inter-rater agreement for categorical ratings (e.g., BI-RADS category, lesion present/absent).

**Required parameters**:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `kappa_expected` | Expected kappa value | 0.70 |
| `kappa_null` | Null hypothesis kappa | 0.40 |
| `po_expected` | Expected proportion of agreement | 0.75 |
| `alpha` | Significance level | 0.05 |
| `power` | Desired power | 0.80 |
| `attrition_rate` | Expected dropout rate | 0.10 |

**Effect size interpretation**: Kappa < 0.20 = slight, 0.21-0.40 = fair, 0.41-0.60 = moderate, 0.61-0.80 = substantial, 0.81-1.00 = almost perfect (Landis & Koch, 1977).

---

### Test 4: Two-Proportion Comparison (Chi-Square)

**When to use**: Comparing proportions between two independent groups (e.g., AI detection rate vs. conventional detection rate).

**Required parameters**:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `p1` | Proportion in group 1 | -- |
| `p2` | Proportion in group 2 | -- |
| `alpha` | Significance level | 0.05 |
| `power` | Desired power | 0.80 |
| `attrition_rate` | Expected dropout rate | 0.15 |

**Effect size interpretation**: Cohen's h = 2 * arcsin(sqrt(p1)) - 2 * arcsin(sqrt(p2)). Small = 0.20, medium = 0.50, large = 0.80.

---

### Test 5: McNemar Test (Paired Proportions)

**When to use**: Paired binary outcomes (e.g., two readers reading same cases, before/after on same patients).

**Required parameters**:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `p01` | P(Method A

More from this repository

skillsSkill

academic-aioSkill

Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. Applies when drafting or reviewing titles, abstracts, structured summary boxes (Key Points / Research in Context / Plain-Language Summary), manuscripts for high-impact medical AI journals (Lancet Digital Health, Radiology, Radiology-AI, npj Digital Medicine, Nature Medicine), preprints (medRxiv/arXiv), GitHub README + CITATION.cff + Zenodo archives, and Hugging Face model/dataset cards. Integrates TRIPOD+AI, CLAIM 2024, STARD-AI, TRIPOD-LLM, DECIDE-AI reporting requirements with generative engine optimization (GEO) principles. Produces a visible pass/fail checklist.

add-journalSkill

analyze-statsSkill

Statistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures. Supports diagnostic accuracy, inter-rater agreement, meta-analysis, survival analysis, survey data, group comparisons, regression, propensity score, and repeated measures.

author-strategySkill

PubMed author profile analysis. Author name → PubMed fetch → study-type classification → visualization → strategy report → optional trajectory-archetype classification.

batch-cohortSkill

Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. The "80-person team" pattern — same validated method, swap variables only. Produces batch R/Python code + summary matrix.

check-reportingSkill

Check manuscript compliance with medical research reporting guidelines. Supports 36 guidelines including STROBE, CONSORT, CONSORT-AI, STARD, STARD-AI, TRIPOD, TRIPOD+AI, TRIPOD-LLM, ARRIVE, PRISMA, PRISMA-DTA, PRISMA-P, CARE, SPIRIT, SPIRIT-AI, CLAIM, DECIDE-AI, MI-CLEAR-LLM, SQUIRE 2.0, CLEAR, MOOSE, GRRAS, SWiM, AMSTAR 2, and risk of bias tools (QUADAS-2, QUADAS-C, RoB 2, ROBINS-I, ROBINS-E, ROBIS, ROB-ME, PROBAST, PROBAST+AI, NOS, COSMIN, RoB NMA). Generates item-by-item assessment with PRESENT/MISSING/PARTIAL status.

clean-dataSkill

Interactive data profiling and cleaning assistant for medical research. Three-stage workflow (profile, flag, code-generate) with user approval gates at each step. Handles missing values, outliers, duplicates, and type mismatches in CSV/Excel clinical data. Does NOT auto-clean — all decisions require researcher confirmation.