Skill284 estrellas del repoactualizado 4d ago

statistical-analysis

This Claude Code skill provides guidance for selecting and executing statistical analyses across frequentist and Bayesian frameworks, including assumption verification, effect size calculation, and results reporting. Use it when designing research analyses, choosing between statistical approaches, interpreting p-values versus Bayesian credible intervals, understanding when effect sizes matter beyond statistical significance, or formatting academic research findings according to APA standards.

Ver fuente Repositorio: SciAgent-Skills

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/jaechang-hits/SciAgent-Skills /tmp/statistical-analysis && cp -r /tmp/statistical-analysis/skills/biostatistics/statistical-analysis ~/.claude/skills/statistical-analysis

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Statistical Analysis

## Overview

Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.

## Key Concepts

### Frequentist vs Bayesian Framework

| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| Core output | p-value, confidence interval | Posterior distribution, credible interval |
| Interpretation | "How likely is this data if H0 is true?" | "How likely is H1 given the data?" |
| Null support | Cannot support H0 (only fail to reject) | Can quantify evidence for H0 via Bayes Factor |
| Prior info | Not used | Incorporated via prior distributions |
| Sample size | Requires adequate power | Works with any sample size |
| Best for | Standard analyses, large samples | Small samples, prior info, complex models |

### Statistical vs Practical Significance

A statistically significant result (p < .05) may be trivially small in practice. Always report:
- **Effect size**: Magnitude of the effect (Cohen's d, eta-squared, r, R-squared)
- **Confidence interval**: Precision of the estimate
- **Context**: Clinical/practical relevance in the domain

### Common Effect Sizes

| Test | Effect Size | Small | Medium | Large |
|------|-------------|-------|--------|-------|
| t-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| t-test (small n) | Hedges' g | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared partial | 0.01 | 0.06 | 0.14 |
| ANOVA | omega-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Regression | f-squared | 0.02 | 0.15 | 0.35 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
| Chi-square 2x2 | phi coefficient | 0.10 | 0.30 | 0.50 |

Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.

### Assumptions Overview

Most parametric tests require:
1. **Independence**: Observations are independent of each other
2. **Normality**: Data (or residuals) are approximately normally distributed
3. **Homogeneity of variance**: Groups have similar variances (for group comparisons)
4. **Linearity**: Relationship between variables is linear (for regression)

When assumptions are violated:
- **Normality violated, n > 30**: Proceed -- parametric tests are robust with large samples
- **Normality violated, n < 30**: Use non-parametric alternative
- **Variance heterogeneity**: Use Welch's correction (t-test) or Welch's ANOVA
- **Linearity violated**: Add polynomial terms, transform variables, or use GAMs

### Test-Specific Assumption Workflows

**T-test assumptions**: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.

**ANOVA assumptions**: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).

**Linear regression assumptions**: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).

**Logistic regression assumptions**: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).

### Specialized Test Categories

Beyond the main decision flowchart, several specialized test families address specific data types:

**Survival / time-to-event analysis**:
- **Log-rank test**: Compares survival curves between groups (non-parametric)
- **Cox proportional hazards**: Models time-to-event with covariates; assumes proportional hazards
- **Parametric survival models**: Weibull, exponential, log-normal for known distributional forms
- Use when outcome is time until an event (death, relapse, failure) with possible censoring

**Count outcome models**:
- **Poisson regression**: For count data where mean approximately equals variance
- **Negative binomial regression**: For overdispersed counts (variance > mean)
- **Zero-inflated models**: For excess zeros beyond what Poisson/NB predicts
- Use when outcome is a count (number of events, incidents, occurrences)

**Agreement and reliability**:
- **Cohen's kappa**: Inter-rater agreement for categorical ratings (2 raters)
- **Fleiss' kappa / Krippendorff's alpha**: Agreement for >2 raters
- **Intraclass correlation coefficient (ICC)**: Continuous ratings reliability
- **Cronbach's alpha**: Internal consistency of multi-item scales
- **Bland-Altman analysis**: Agreement between two measurement methods (continuous)
- Use when assessing measurement reliability or inter-rater consistency

**Categorical data extensions**:
- **McNemar's test**: Paired binary outcomes (2x2)
- **Cochran's Q test**: Paired binary outcomes (3+ conditions)
- **Cochran-Armitage trend test**: Ordered categories in contingency tables

## Decision Framework

### Test Selection Flowchart

```
What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
|   |
|   +-- How many groups?
|   |   +-- 2 groups
|   |   |   +-- Independent -> Independent t-test (or Mann-Whitney U)
|   |   |   +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
|   |   +-- 3+ groups
|   |      +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
|   |      +-- Repeated -> Repeated-measures ANOVA (or Friedman)
|   |
|   +-- Multiple factors? -> Factorial ANOVA / Mi

Del mismo repositorio

sciagent-skill-creatorSkill

opentrons-integrationSkill

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

plotly-interactive-visualizationSkill

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

seaborn-statistical-visualizationSkill

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

single-cell-annotationSkill

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

pymc-bayesian-modelingSkill

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

scikit-survival-analysisSkill

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

statsmodels-statistical-modelingSkill

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.