Skip to main content
ClaudeWave
Skill199 repo starsupdated 16d ago

statistical-analysis

>-

Install in Claude Code
Copy
git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/statistical-analysis && cp -r /tmp/statistical-analysis/skills/biostatistics/statistical-analysis ~/.claude/skills/statistical-analysis
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Statistical Analysis

## Overview

Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.

## Key Concepts

### Frequentist vs Bayesian Framework

| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| Core output | p-value, confidence interval | Posterior distribution, credible interval |
| Interpretation | "How likely is this data if H0 is true?" | "How likely is H1 given the data?" |
| Null support | Cannot support H0 (only fail to reject) | Can quantify evidence for H0 via Bayes Factor |
| Prior info | Not used | Incorporated via prior distributions |
| Sample size | Requires adequate power | Works with any sample size |
| Best for | Standard analyses, large samples | Small samples, prior info, complex models |

### Statistical vs Practical Significance

A statistically significant result (p < .05) may be trivially small in practice. Always report:
- **Effect size**: Magnitude of the effect (Cohen's d, eta-squared, r, R-squared)
- **Confidence interval**: Precision of the estimate
- **Context**: Clinical/practical relevance in the domain

### Common Effect Sizes

| Test | Effect Size | Small | Medium | Large |
|------|-------------|-------|--------|-------|
| t-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| t-test (small n) | Hedges' g | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared partial | 0.01 | 0.06 | 0.14 |
| ANOVA | omega-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Regression | f-squared | 0.02 | 0.15 | 0.35 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
| Chi-square 2x2 | phi coefficient | 0.10 | 0.30 | 0.50 |

Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.

### Assumptions Overview

Most parametric tests require:
1. **Independence**: Observations are independent of each other
2. **Normality**: Data (or residuals) are approximately normally distributed
3. **Homogeneity of variance**: Groups have similar variances (for group comparisons)
4. **Linearity**: Relationship between variables is linear (for regression)

When assumptions are violated:
- **Normality violated, n > 30**: Proceed -- parametric tests are robust with large samples
- **Normality violated, n < 30**: Use non-parametric alternative
- **Variance heterogeneity**: Use Welch's correction (t-test) or Welch's ANOVA
- **Linearity violated**: Add polynomial terms, transform variables, or use GAMs

### Test-Specific Assumption Workflows

**T-test assumptions**: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.

**ANOVA assumptions**: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).

**Linear regression assumptions**: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).

**Logistic regression assumptions**: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).

### Specialized Test Categories

Beyond the main decision flowchart, several specialized test families address specific data types:

**Survival / time-to-event analysis**:
- **Log-rank test**: Compares survival curves between groups (non-parametric)
- **Cox proportional hazards**: Models time-to-event with covariates; assumes proportional hazards
- **Parametric survival models**: Weibull, exponential, log-normal for known distributional forms
- Use when outcome is time until an event (death, relapse, failure) with possible censoring

**Count outcome models**:
- **Poisson regression**: For count data where mean approximately equals variance
- **Negative binomial regression**: For overdispersed counts (variance > mean)
- **Zero-inflated models**: For excess zeros beyond what Poisson/NB predicts
- Use when outcome is a count (number of events, incidents, occurrences)

**Agreement and reliability**:
- **Cohen's kappa**: Inter-rater agreement for categorical ratings (2 raters)
- **Fleiss' kappa / Krippendorff's alpha**: Agreement for >2 raters
- **Intraclass correlation coefficient (ICC)**: Continuous ratings reliability
- **Cronbach's alpha**: Internal consistency of multi-item scales
- **Bland-Altman analysis**: Agreement between two measurement methods (continuous)
- Use when assessing measurement reliability or inter-rater consistency

**Categorical data extensions**:
- **McNemar's test**: Paired binary outcomes (2x2)
- **Cochran's Q test**: Paired binary outcomes (3+ conditions)
- **Cochran-Armitage trend test**: Ordered categories in contingency tables

## Decision Framework

### Test Selection Flowchart

```
What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
|   |
|   +-- How many groups?
|   |   +-- 2 groups
|   |   |   +-- Independent -> Independent t-test (or Mann-Whitney U)
|   |   |   +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
|   |   +-- 3+ groups
|   |      +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
|   |      +-- Repeated -> Repeated-measures ANOVA (or Friedman)
|   |
|   +-- Multiple factors? -> Factorial ANOVA / Mi
sciagent-skill-creatorSkill

|

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statsmodels-statistical-modelingSkill

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.