Skip to main content
ClaudeWave
Skill65 estrellas del repoactualizado yesterday

statistical-modeling

Regression analysis, ANOVA, generalized linear models, Bayesian methods, and model selection. Covers the full modeling workflow from problem formulation through diagnostics -- linear regression, logistic regression, Poisson regression, mixed-effects models, prior specification, posterior inference, AIC/BIC comparison, cross-validation for model selection, and assumption checking. Use when fitting models, testing hypotheses, or selecting among competing statistical explanations.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Tibsfox/gsd-skill-creator /tmp/statistical-modeling && cp -r /tmp/statistical-modeling/examples/skills/data-science/statistical-modeling ~/.claude/skills/statistical-modeling
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Statistical Modeling

Statistical modeling is the practice of fitting mathematical structures to data in order to quantify relationships, test hypotheses, and make predictions. Unlike machine learning, which optimizes prediction, statistical modeling privileges interpretability and inference -- understanding *why* variables relate, not just *that* they do. Leo Breiman's "two cultures" paper (2001) crystallized this distinction. This skill covers the inferential tradition while acknowledging where the two cultures overlap.

**Agent affinity:** tukey (EDA and diagnostics), fisher (experimental design and ANOVA), breiman (model comparison)

**Concept IDs:** data-hypothesis-testing, data-confidence-intervals, data-correlation, data-normal-distribution

## The Modeling Workflow

| Stage | Goal | Key operations |
|---|---|---|
| 1. Formulation | Define the question as a model | Specify response variable, predictors, functional form |
| 2. Exploration | Understand data structure | Scatterplots, correlation matrices, distribution checks |
| 3. Fitting | Estimate parameters | OLS, MLE, MCMC, IRLS depending on model class |
| 4. Diagnostics | Check assumptions | Residual plots, Q-Q plots, leverage, VIF |
| 5. Inference | Draw conclusions | Confidence intervals, hypothesis tests, effect sizes |
| 6. Selection | Compare models | AIC, BIC, cross-validation, likelihood ratio tests |
| 7. Communication | Report results | Effect estimates with uncertainty, not just p-values |

## Linear Regression

### The Model

y = beta_0 + beta_1 * x_1 + beta_2 * x_2 + ... + beta_p * x_p + epsilon

where epsilon ~ N(0, sigma^2) independently. The betas are estimated by ordinary least squares (OLS), minimizing the sum of squared residuals.

### Assumptions (LINE)

| Assumption | Check | Violation consequence |
|---|---|---|
| **L**inearity | Residual vs. fitted plot -- no pattern | Biased estimates, meaningless coefficients |
| **I**ndependence | Study design, Durbin-Watson test | Underestimated standard errors, inflated significance |
| **N**ormality of residuals | Q-Q plot, Shapiro-Wilk test | Unreliable confidence intervals and p-values (less critical for large n by CLT) |
| **E**qual variance (homoscedasticity) | Scale-location plot, Breusch-Pagan test | Inefficient estimates, unreliable standard errors |

### Interpretation

- **beta_j:** The expected change in y for a one-unit increase in x_j, holding all other predictors constant.
- **R-squared:** Proportion of variance in y explained by the model. Not a measure of model quality alone -- a high R-squared with violated assumptions is meaningless.
- **Adjusted R-squared:** Penalizes for number of predictors. Always use this for model comparison.

### Multicollinearity

When predictors are highly correlated, coefficient estimates become unstable. Variance Inflation Factor (VIF) quantifies this: VIF > 5-10 indicates problematic collinearity. Remedies: drop a predictor, combine predictors via PCA, or use regularization (ridge regression).

## Logistic Regression

### The Model

For binary outcome y in {0, 1}:

log(p / (1 - p)) = beta_0 + beta_1 * x_1 + ... + beta_p * x_p

where p = P(y = 1 | x). The left side is the log-odds (logit). Parameters are estimated by maximum likelihood.

### Interpretation

- **exp(beta_j):** The odds ratio for a one-unit increase in x_j, holding other predictors constant. An odds ratio of 1.5 means 50% higher odds of the outcome.
- **Predicted probability:** p = 1 / (1 + exp(-(beta_0 + beta_1 * x_1 + ...))). The sigmoid function maps the linear predictor to [0, 1].
- **No R-squared analog that works well.** Use pseudo-R-squared measures (McFadden, Nagelkerke) with caution. Prefer ROC-AUC or calibration plots for assessing fit.

### Assumptions

- Observations are independent.
- The log-odds are a linear function of the predictors (check with partial residual plots).
- No perfect multicollinearity.
- No assumption of normality or equal variance -- this is not linear regression with a binary outcome.

## Generalized Linear Models (GLMs)

Logistic regression is one instance of the GLM framework. The general structure:

| Component | Role |
|---|---|
| **Random component** | Distribution of y (Normal, Binomial, Poisson, Gamma, ...) |
| **Systematic component** | Linear predictor eta = X * beta |
| **Link function** | g(mu) = eta, connecting the mean to the linear predictor |

### Common GLMs

| Response type | Distribution | Link | Model name |
|---|---|---|---|
| Continuous | Normal | Identity | Linear regression |
| Binary | Binomial | Logit | Logistic regression |
| Count | Poisson | Log | Poisson regression |
| Count (overdispersed) | Negative binomial | Log | Negative binomial regression |
| Positive continuous | Gamma | Log or inverse | Gamma regression |
| Proportion (not 0/1) | Beta | Logit | Beta regression |

### Poisson Regression

For count data: log(mu) = beta_0 + beta_1 * x_1 + ... Assumes the mean equals the variance (equidispersion). When variance > mean (overdispersion), use negative binomial or quasi-Poisson. Always check for overdispersion.

## Analysis of Variance (ANOVA)

### Purpose

ANOVA tests whether group means differ. It is a special case of linear regression where all predictors are categorical.

### One-Way ANOVA

- **Null hypothesis:** mu_1 = mu_2 = ... = mu_k (all group means are equal).
- **Test statistic:** F = (between-group variance) / (within-group variance).
- **Assumptions:** Independence, normality within groups, equal variances (Levene's test).
- **Post-hoc:** If the F-test rejects, pairwise comparisons identify which groups differ. Use Tukey's HSD or Bonferroni correction to control family-wise error rate.

### Two-Way ANOVA

Adds a second factor and their interaction. The interaction term tests whether the effect of one factor depends on the level of the other. Always plot the interaction (mean response by factor A, colored by factor B) before interpreting the F-test.

### ANOVA as Regression

One-
art-history-movementsSkill

Major art movements and their historical context for art education. Covers 12 movements from the Renaissance to contemporary art, their defining characteristics, key artists, signature works, and the intellectual/social forces that produced them. Use when analyzing artworks in historical context, understanding stylistic lineages, identifying influences across periods, or connecting studio practice to art-historical precedent.

color-theorySkill

Color theory principles for art education. Covers the three color properties (hue, saturation, value), color mixing systems (subtractive and additive), color relationships (complementary, analogous, triadic, split-complementary), color temperature, simultaneous contrast and the relativity of color perception, and practical palette construction. Use when analyzing color in artworks, planning color schemes, understanding optical phenomena in painting, or investigating Albers's Interaction of Color experiments.

creative-processSkill

The creative process in art from idea to exhibition. Covers five phases of creative work (inspiration, incubation, exploration, execution, reflection), sketchbook practice, artist statements, critique methodology (formal and conceptual), portfolio development, and the studio as a working environment. Use when guiding students through project development, facilitating critique sessions, developing artist statements, curating portfolios, or understanding how professional artists structure their creative practice.

digital-artSkill

Digital art tools, techniques, and workflows for art education. Covers raster and vector workflows, digital painting, photo manipulation, generative and procedural art, 3D modeling and rendering, pixel art, the relationship between traditional skills and digital execution, and ethical considerations of AI-generated imagery. Use when working with digital tools, evaluating digital art, or bridging traditional art concepts into digital practice.

drawing-observationSkill

Observational drawing and visual perception techniques for art education. Covers contour drawing, gesture drawing, negative space, proportion and measurement, value mapping, spatial depth cues, and the cognitive shift from symbolic to perceptual seeing. Use when teaching drawing fundamentals, analyzing observational accuracy, or developing visual literacy in any medium.

sculpture-3dSkill

Three-dimensional art and sculptural thinking for art education. Covers additive and subtractive sculptural processes, armature construction, modeling in clay, carving principles, casting and moldmaking, assemblage and found-object sculpture, installation art as expanded sculpture, and the conceptual transition from pictorial to spatial thinking. Use when working with three-dimensional media, analyzing sculptural form, understanding spatial composition, or investigating the relationship between sculpture and site.

celestial-coordinatesSkill

Celestial coordinate systems and sky positioning. Covers horizon (altitude-azimuth), equatorial (right ascension-declination), ecliptic, and galactic systems; epoch and precession; coordinate transformations; planisphere use; and practical sky-locating from any latitude and date. Use when locating objects, planning observations, converting catalog coordinates, or teaching the geometry of the sky.

cosmological-observationSkill

Observational cosmology from Hubble's law to the CMB. Covers redshift, Hubble expansion, the cosmological parameters, the cosmic microwave background, large-scale structure, galaxy rotation curves and dark matter, Type Ia SNe and dark energy, and the current state of Lambda-CDM. Use when reasoning about the large-scale universe, interpreting cosmological surveys, or teaching the Big Bang evidence chain.