Skip to main content
ClaudeWave
Skill65 estrellas del repoactualizado yesterday

experimental-design-ds

A/B testing, randomization, sample size calculation, confounding control, and causal inference for data science. Covers the full experimental lifecycle from hypothesis formulation through power analysis, randomization strategies, blocking, factorial designs, sequential testing, and the potential outcomes framework for causal claims. Use when designing experiments, planning A/B tests, calculating sample sizes, or reasoning about causation from data.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/Tibsfox/gsd-skill-creator /tmp/experimental-design-ds && cp -r /tmp/experimental-design-ds/examples/skills/data-science/experimental-design-ds ~/.claude/skills/experimental-design-ds
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Experimental Design for Data Science

Experimental design is the discipline of collecting data so that the analysis can answer the intended question. Ronald Fisher, working at the Rothamsted agricultural station in the 1920s, formalized the three pillars -- randomization, replication, and blocking -- that remain the foundation of every modern experiment, from clinical trials to A/B tests on websites. This skill covers experimental design from the data scientist's perspective: planning experiments, ensuring valid causal inference, and avoiding the pitfalls that invalidate conclusions.

**Agent affinity:** fisher (experimental design, ANOVA), tukey (exploratory analysis of experimental results), benjamin (ethical review of experiments)

**Concept IDs:** data-hypothesis-testing, data-confidence-intervals, data-probability-basics, data-sampling-methods

## Fisher's Three Pillars

### 1. Randomization

Random assignment of experimental units to treatment groups ensures that any observed difference is either due to the treatment or due to chance -- not due to confounders. Without randomization, the groups may differ systematically in ways that bias the result.

**Mechanism:** Each unit is assigned to treatment or control by a random process (coin flip, random number generator). This does not guarantee balance -- it guarantees that imbalances are random and quantifiable by probability theory.

**Why it works:** Randomization breaks the association between treatment assignment and all potential confounders, both observed and unobserved. No observational adjustment method can do this for unobserved confounders.

### 2. Replication

Multiple experimental units per group. Replication serves two purposes:

- **Statistical power:** More units means smaller standard errors and greater ability to detect real effects.
- **Generalizability:** Results from one unit might be idiosyncratic. Results replicated across many units are more convincing.

**Replication is not repetition.** Measuring the same unit 10 times is repetition (estimates measurement error). Measuring 10 different units is replication (estimates treatment effect variability).

### 3. Blocking

Grouping experimental units by a known source of variability and randomizing within blocks. If you know that males and females respond differently to a treatment, block by sex and randomize within each block. This reduces unexplained variance and increases power.

**Blocking vs. stratification:** Same concept, different fields. Blocking is the experimental design term (Fisher); stratification is the sampling term.

## The Experimental Lifecycle

| Stage | Goal | Key decisions |
|---|---|---|
| 1. Hypothesis | Define what you want to learn | Primary hypothesis, secondary hypotheses, directional vs. two-sided |
| 2. Metric selection | Define how you will measure the effect | Primary metric, guardrail metrics, sensitivity metrics |
| 3. Power analysis | Determine required sample size | Minimum detectable effect, significance level, power |
| 4. Design | Choose experimental structure | Completely randomized, blocked, factorial, sequential |
| 5. Randomization | Assign units to groups | Simple, stratified, cluster randomization |
| 6. Execution | Run the experiment | Monitor for implementation errors, compliance |
| 7. Analysis | Test the hypothesis | Pre-specified analysis plan, intention-to-treat |
| 8. Interpretation | Draw conclusions | Effect size, confidence interval, practical significance |

## A/B Testing

### The Basic Framework

An A/B test is the simplest randomized experiment: two groups (A = control, B = treatment), one intervention, one primary metric.

**Steps:**

1. Define the metric (conversion rate, revenue per user, time on page).
2. Define the minimum detectable effect (MDE) -- the smallest change worth detecting.
3. Calculate sample size using power analysis.
4. Randomize users into A and B.
5. Run until the pre-specified sample size is reached.
6. Analyze using the pre-specified test (t-test, chi-squared, Mann-Whitney).
7. Report the effect size with a confidence interval.

### Sample Size Calculation

For a two-sample test of proportions (e.g., conversion rates):

n = (Z_{alpha/2} + Z_beta)^2 * (p_1 * (1 - p_1) + p_2 * (1 - p_2)) / (p_1 - p_2)^2

Where:
- Z_{alpha/2} = critical value for significance level (1.96 for alpha = 0.05, two-sided)
- Z_beta = critical value for power (0.84 for power = 0.80)
- p_1 = baseline conversion rate
- p_2 = expected conversion rate under treatment

**Key insight:** Sample size scales with the inverse square of the effect size. Detecting a 1% improvement requires 4x the sample size of detecting a 2% improvement. This is why small effects in large populations require enormous experiments.

### Common A/B Testing Pitfalls

| Pitfall | Problem | Fix |
|---|---|---|
| **Peeking** | Checking results before full sample size; inflates false positive rate | Use sequential testing methods or commit to fixed horizon |
| **Multiple comparisons** | Testing many metrics inflates family-wise error | Bonferroni correction, FDR control, or pre-specify one primary metric |
| **Network effects** | One user's treatment affects another user's outcome (social platforms) | Cluster randomization (randomize by geographic region or social cluster) |
| **Novelty/primacy effects** | Users react differently to new features initially | Run experiments long enough for the effect to stabilize (2-4 weeks) |
| **Sample ratio mismatch** | Unequal assignment ratios indicate implementation bugs | Check actual vs. expected ratio before analyzing results |
| **Simpson's paradox** | Aggregate result reverses when stratified by a confounder | Pre-stratify or check for compositional changes across groups |

## Randomization Strategies

| Strategy | Description | Use when |
|---|---|---|
| **Simple** | Coin flip per unit | Default; works for large samples |
| **Stratified** | Randomize within strata of important covariates | Known prognost
art-history-movementsSkill

Major art movements and their historical context for art education. Covers 12 movements from the Renaissance to contemporary art, their defining characteristics, key artists, signature works, and the intellectual/social forces that produced them. Use when analyzing artworks in historical context, understanding stylistic lineages, identifying influences across periods, or connecting studio practice to art-historical precedent.

color-theorySkill

Color theory principles for art education. Covers the three color properties (hue, saturation, value), color mixing systems (subtractive and additive), color relationships (complementary, analogous, triadic, split-complementary), color temperature, simultaneous contrast and the relativity of color perception, and practical palette construction. Use when analyzing color in artworks, planning color schemes, understanding optical phenomena in painting, or investigating Albers's Interaction of Color experiments.

creative-processSkill

The creative process in art from idea to exhibition. Covers five phases of creative work (inspiration, incubation, exploration, execution, reflection), sketchbook practice, artist statements, critique methodology (formal and conceptual), portfolio development, and the studio as a working environment. Use when guiding students through project development, facilitating critique sessions, developing artist statements, curating portfolios, or understanding how professional artists structure their creative practice.

digital-artSkill

Digital art tools, techniques, and workflows for art education. Covers raster and vector workflows, digital painting, photo manipulation, generative and procedural art, 3D modeling and rendering, pixel art, the relationship between traditional skills and digital execution, and ethical considerations of AI-generated imagery. Use when working with digital tools, evaluating digital art, or bridging traditional art concepts into digital practice.

drawing-observationSkill

Observational drawing and visual perception techniques for art education. Covers contour drawing, gesture drawing, negative space, proportion and measurement, value mapping, spatial depth cues, and the cognitive shift from symbolic to perceptual seeing. Use when teaching drawing fundamentals, analyzing observational accuracy, or developing visual literacy in any medium.

sculpture-3dSkill

Three-dimensional art and sculptural thinking for art education. Covers additive and subtractive sculptural processes, armature construction, modeling in clay, carving principles, casting and moldmaking, assemblage and found-object sculpture, installation art as expanded sculpture, and the conceptual transition from pictorial to spatial thinking. Use when working with three-dimensional media, analyzing sculptural form, understanding spatial composition, or investigating the relationship between sculpture and site.

celestial-coordinatesSkill

Celestial coordinate systems and sky positioning. Covers horizon (altitude-azimuth), equatorial (right ascension-declination), ecliptic, and galactic systems; epoch and precession; coordinate transformations; planisphere use; and practical sky-locating from any latitude and date. Use when locating objects, planning observations, converting catalog coordinates, or teaching the geometry of the sky.

cosmological-observationSkill

Observational cosmology from Hubble's law to the CMB. Covers redshift, Hubble expansion, the cosmological parameters, the cosmic microwave background, large-scale structure, galaxy rotation curves and dark matter, Type Ia SNe and dark energy, and the current state of Lambda-CDM. Use when reasoning about the large-scale universe, interpreting cosmological surveys, or teaching the Big Bang evidence chain.