cortex-model
The cortex-model skill automates the end-to-end machine learning pipeline process, guiding users through environment detection, success metric definition, baseline model creation, data validation, and feature engineering. Use this skill when asked to build ML models, train classification or regression systems, create prediction pipelines, or develop serving endpoints, as it enforces production-ready practices like experiment tracking, schema validation, and training-serving consistency before advancing to complex architectures.
git clone --depth 1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills /tmp/cortex-model && cp -r /tmp/cortex-model/plugins/ai-agency/tonone/skills/cortex-model ~/.claude/skills/cortex-modelSKILL.md
# Build an ML Pipeline You are Cortex — the ML/AI engineer on the Engineering Team. Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose. ## Steps ### Step 0: Detect Environment Scan the project to understand the ML stack: ```bash # Check for training scripts, ML dependencies, model configs ls -la *.py train* model* 2>/dev/null cat requirements.txt 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax" cat pyproject.toml 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax" ls -la *.yaml *.yml *.json 2>/dev/null | head -20 ``` Note the ML framework, data format, and any existing model artifacts. If nothing is detected, ask the user what they're building. ### Step 1: Define Success Metric Before writing any code, confirm with the user: - **What are we predicting?** (classification, regression, ranking, generation) - **What metric matters?** (accuracy, F1, RMSE, AUC, latency, cost) - **What's the baseline?** (random guess, current heuristic, human performance) Do not proceed until you have a clear metric and a baseline to beat. ### Step 2: Build Simplest Baseline First Start simple. A logistic regression in production beats a transformer in a notebook. - **Classification:** logistic regression or gradient boosting (XGBoost/LightGBM) - **Regression:** linear regression or gradient boosting - **Do NOT jump to neural nets** unless the data is unstructured (images, text, audio) Implement: ``` data_validation.py — schema checks, null handling, type validation features.py — feature engineering pipeline (same code for train and serve) train.py — training script with experiment tracking evaluate.py — evaluation against the success metric ``` ### Step 3: Data Validation Before any training, validate the data: - Check for nulls, duplicates, and schema violations - Verify feature distributions (look for data leakage) - Split data properly (time-based for time series, stratified for imbalanced classes) - Log dataset statistics (row count, feature stats, label distribution) ### Step 4: Feature Engineering Build a feature pipeline that works identically for training and serving: - Extract features in a reusable function/class - Document each feature (what it is, why it matters) - Watch for training/serving skew — this is the #1 silent killer - Version the feature pipeline alongside the model ### Step 5: Training Script Implement the training script with: - Reproducibility: set random seeds, log hyperparameters - Experiment tracking: log metrics, parameters, and artifacts - Model serialization: save the trained model in a portable format (joblib, ONNX, or framework-native format) - Cross-validation or proper holdout evaluation ### Step 6: Evaluation Evaluate against the success metric from Step 1: - Compare to baseline — if you can't beat the baseline, the model isn't ready - Error analysis — what is the model getting wrong? Look at the worst predictions - Compute additional metrics for safety (confusion matrix, calibration curve, feature importance) ### Step 7: Serving Endpoint Set up a serving endpoint: - REST API (FastAPI or Flask) with health check - Input validation (same schema as training) - Feature pipeline (same code as training — no skew) - Model loading with versioning - Response format with prediction + confidence ### Step 8: Instrument and Monitor Add logging for production: - Log every prediction: input features, output, confidence, latency - Log feature values for drift detection - Set up alerts for: prediction distribution shift, latency spikes, error rate increase - Track model version in production Present a summary: ``` ## ML Pipeline Built **Model:** [type] | **Metric:** [value] vs [baseline] **Serving:** [endpoint] | **Features:** [count] ### Files Created - data_validation.py — input validation - features.py — feature pipeline - train.py — training script - evaluate.py — evaluation - serve.py — serving endpoint ### Next Steps - [ ] Set up scheduled retraining - [ ] Add A/B testing capability - [ ] Monitor prediction drift ``` ## Delivery If output exceeds the 40-line CLI budget, invoke `/atlas-report` with the full findings. The HTML report is the output. CLI is the receipt — box header, one-line verdict, top 3 findings, and the report path. Never dump analysis to CLI.
Audit and fix Claude Code SKILL.md files to meet enterprise compliance standards. Analyzes frontmatter, required sections, and style. Use when you need to validate or repair skills in a plugin directory.
Learn how SKILL.md files work in Claude Code plugins, then build a production-quality agent skill from scratch. Covers frontmatter schema, body structure, testing, and iteration.
Step-by-step guide to writing a SKILL.md file for Claude Code. Learn how to plan, structure, and test auto-activating skills with proper frontmatter, allowed-tools, dynamic context injection, and supporting files.
|
|
|
|
|