machine-learning-foundations
Supervised and unsupervised learning, bias-variance tradeoff, cross-validation, decision trees, ensemble methods, neural network fundamentals, and the practitioner's workflow from problem framing through deployment. Covers classification, regression, clustering, dimensionality reduction, regularization, hyperparameter tuning, and evaluation metrics. Use when building predictive models, selecting algorithms, or understanding the machine learning pipeline.
git clone --depth 1 https://github.com/Tibsfox/gsd-skill-creator /tmp/machine-learning-foundations && cp -r /tmp/machine-learning-foundations/examples/skills/data-science/machine-learning-foundations ~/.claude/skills/machine-learning-foundationsSKILL.md
# Machine Learning Foundations Machine learning is the practice of building systems that learn patterns from data and use those patterns to make predictions or decisions on new data. Where statistical modeling (the inference culture) asks "what is the relationship between X and Y?", machine learning (the prediction culture) asks "given X, what is the best prediction of Y?" This skill covers the foundational concepts, algorithms, and workflow of machine learning from the practitioner's perspective. **Agent affinity:** breiman (algorithm selection, ensemble methods), tukey (feature engineering, EDA) **Concept IDs:** data-correlation, data-distributions, data-measures-of-spread, data-hypothesis-testing ## The ML Workflow | Stage | Goal | Key operations | |---|---|---| | 1. Problem framing | Define the task precisely | Classification vs. regression vs. clustering; define target variable and success metric | | 2. Data collection | Assemble training data | Sources, sampling, labeling; ensure data represents the deployment population | | 3. Feature engineering | Create informative inputs | Domain-driven features, transformations, encoding categoricals | | 4. Train/test split | Prevent overfitting evaluation | Hold out 20-30% for testing; never touch test set during development | | 5. Model selection | Choose algorithm family | Based on data size, interpretability needs, problem structure | | 6. Training | Fit model parameters | Optimization (gradient descent, tree splitting, etc.) | | 7. Validation | Tune hyperparameters | k-fold cross-validation on training set only | | 8. Evaluation | Assess on held-out test set | Metrics appropriate to the problem (accuracy, F1, RMSE, etc.) | | 9. Interpretation | Understand what the model learned | Feature importance, partial dependence, SHAP values | | 10. Deployment | Put the model in production | Monitoring, drift detection, retraining schedule | ## Supervised Learning ### Classification The task: given features X, predict a categorical label y. **Key algorithms:** | Algorithm | Strengths | Weaknesses | When to use | |---|---|---|---| | **Logistic regression** | Interpretable, fast, probabilistic | Linear decision boundary | Baseline; when interpretability matters | | **k-Nearest Neighbors** | Non-parametric, no training phase | Slow at prediction, curse of dimensionality | Small datasets, low dimensionality | | **Decision tree** | Interpretable, handles mixed types | Overfits easily, unstable | When interpretability is paramount; as building block for ensembles | | **Random forest** | Robust, handles high dimensions | Less interpretable than single tree | Default for tabular data | | **Gradient boosting** | State-of-the-art tabular performance | Prone to overfitting without tuning | Competition-grade tabular prediction | | **SVM** | Effective in high dimensions | Slow on large datasets, kernel choice | Text classification, small-medium datasets | | **Neural network** | Learns complex patterns, scales to huge data | Requires large data, expensive, black box | Images, text, sequences, very large datasets | ### Regression The task: given features X, predict a continuous value y. Same algorithms apply (linear regression, k-NN regression, decision tree regression, random forest regression, gradient boosting regression, neural network regression). The loss function changes from cross-entropy to squared error (or absolute error, Huber loss, etc.). ### Evaluation Metrics **Classification:** | Metric | Formula / Definition | When to use | |---|---|---| | **Accuracy** | Correct / Total | Balanced classes only | | **Precision** | TP / (TP + FP) | Cost of false positives is high (spam detection) | | **Recall** | TP / (TP + FN) | Cost of false negatives is high (cancer screening) | | **F1 score** | 2 * Precision * Recall / (Precision + Recall) | Need balance between precision and recall | | **ROC-AUC** | Area under ROC curve | Ranking quality across thresholds | | **Log loss** | Negative log-likelihood of predicted probabilities | When calibrated probabilities matter | **Regression:** | Metric | Formula / Definition | When to use | |---|---|---| | **MSE** | Mean of (y - y_hat)^2 | Default; penalizes large errors | | **RMSE** | sqrt(MSE) | Same scale as y; more interpretable | | **MAE** | Mean of |y - y_hat| | Robust to outliers | | **R-squared** | 1 - (SS_res / SS_tot) | Proportion of variance explained | | **MAPE** | Mean of |y - y_hat| / |y| * 100 | Percentage interpretation; fails when y near 0 | ## The Bias-Variance Tradeoff The expected prediction error decomposes into three components: Error = Bias^2 + Variance + Irreducible noise - **Bias:** Error from oversimplifying the model. A linear model fit to a quadratic relationship has high bias (underfitting). - **Variance:** Error from model sensitivity to training data. A deep decision tree memorizes the training set and varies wildly across samples (overfitting). - **Irreducible noise:** Inherent randomness in the data. No model can reduce this. **The tradeoff:** Increasing model complexity reduces bias but increases variance. Decreasing complexity reduces variance but increases bias. The optimal model balances both. **Regularization** controls this tradeoff by penalizing complexity: | Method | Penalty | Effect | |---|---|---| | **Ridge (L2)** | Sum of beta_j^2 | Shrinks coefficients toward zero; keeps all predictors | | **Lasso (L1)** | Sum of |beta_j| | Shrinks coefficients; sets some exactly to zero (feature selection) | | **Elastic net** | Alpha * L1 + (1 - Alpha) * L2 | Combines ridge and lasso benefits | | **Tree depth limit** | Max depth, min samples per leaf | Prevents tree from memorizing noise | | **Dropout** | Randomly zero out neurons during training | Prevents neural network co-adaptation | | **Early stopping** | Stop training when validation error increases | Universal; works for any iterative algorithm | ## Cross-Validation Cross-validation estimates out-of-sample perfor
Major art movements and their historical context for art education. Covers 12 movements from the Renaissance to contemporary art, their defining characteristics, key artists, signature works, and the intellectual/social forces that produced them. Use when analyzing artworks in historical context, understanding stylistic lineages, identifying influences across periods, or connecting studio practice to art-historical precedent.
Color theory principles for art education. Covers the three color properties (hue, saturation, value), color mixing systems (subtractive and additive), color relationships (complementary, analogous, triadic, split-complementary), color temperature, simultaneous contrast and the relativity of color perception, and practical palette construction. Use when analyzing color in artworks, planning color schemes, understanding optical phenomena in painting, or investigating Albers's Interaction of Color experiments.
The creative process in art from idea to exhibition. Covers five phases of creative work (inspiration, incubation, exploration, execution, reflection), sketchbook practice, artist statements, critique methodology (formal and conceptual), portfolio development, and the studio as a working environment. Use when guiding students through project development, facilitating critique sessions, developing artist statements, curating portfolios, or understanding how professional artists structure their creative practice.
Digital art tools, techniques, and workflows for art education. Covers raster and vector workflows, digital painting, photo manipulation, generative and procedural art, 3D modeling and rendering, pixel art, the relationship between traditional skills and digital execution, and ethical considerations of AI-generated imagery. Use when working with digital tools, evaluating digital art, or bridging traditional art concepts into digital practice.
Observational drawing and visual perception techniques for art education. Covers contour drawing, gesture drawing, negative space, proportion and measurement, value mapping, spatial depth cues, and the cognitive shift from symbolic to perceptual seeing. Use when teaching drawing fundamentals, analyzing observational accuracy, or developing visual literacy in any medium.
Three-dimensional art and sculptural thinking for art education. Covers additive and subtractive sculptural processes, armature construction, modeling in clay, carving principles, casting and moldmaking, assemblage and found-object sculpture, installation art as expanded sculpture, and the conceptual transition from pictorial to spatial thinking. Use when working with three-dimensional media, analyzing sculptural form, understanding spatial composition, or investigating the relationship between sculpture and site.
Celestial coordinate systems and sky positioning. Covers horizon (altitude-azimuth), equatorial (right ascension-declination), ecliptic, and galactic systems; epoch and precession; coordinate transformations; planisphere use; and practical sky-locating from any latitude and date. Use when locating objects, planning observations, converting catalog coordinates, or teaching the geometry of the sky.
Observational cosmology from Hubble's law to the CMB. Covers redshift, Hubble expansion, the cosmological parameters, the cosmic microwave background, large-scale structure, galaxy rotation curves and dark matter, Type Ia SNe and dark energy, and the current state of Lambda-CDM. Use when reasoning about the large-scale universe, interpreting cosmological surveys, or teaching the Big Bang evidence chain.