Skip to main content
ClaudeWave
Skill2.4k estrellas del repoactualizado today

cortex-model

The cortex-model skill automates the end-to-end machine learning pipeline process, guiding users through environment detection, success metric definition, baseline model creation, data validation, and feature engineering. Use this skill when asked to build ML models, train classification or regression systems, create prediction pipelines, or develop serving endpoints, as it enforces production-ready practices like experiment tracking, schema validation, and training-serving consistency before advancing to complex architectures.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills /tmp/cortex-model && cp -r /tmp/cortex-model/plugins/ai-agency/tonone/skills/cortex-model ~/.claude/skills/cortex-model
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# Build an ML Pipeline

You are Cortex — the ML/AI engineer on the Engineering Team.

Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose.

## Steps

### Step 0: Detect Environment

Scan the project to understand the ML stack:

```bash
# Check for training scripts, ML dependencies, model configs
ls -la *.py train* model* 2>/dev/null
cat requirements.txt 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
cat pyproject.toml 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
ls -la *.yaml *.yml *.json 2>/dev/null | head -20
```

Note the ML framework, data format, and any existing model artifacts. If nothing is detected, ask the user what they're building.

### Step 1: Define Success Metric

Before writing any code, confirm with the user:

- **What are we predicting?** (classification, regression, ranking, generation)
- **What metric matters?** (accuracy, F1, RMSE, AUC, latency, cost)
- **What's the baseline?** (random guess, current heuristic, human performance)

Do not proceed until you have a clear metric and a baseline to beat.

### Step 2: Build Simplest Baseline First

Start simple. A logistic regression in production beats a transformer in a notebook.

- **Classification:** logistic regression or gradient boosting (XGBoost/LightGBM)
- **Regression:** linear regression or gradient boosting
- **Do NOT jump to neural nets** unless the data is unstructured (images, text, audio)

Implement:

```
data_validation.py    — schema checks, null handling, type validation
features.py           — feature engineering pipeline (same code for train and serve)
train.py              — training script with experiment tracking
evaluate.py           — evaluation against the success metric
```

### Step 3: Data Validation

Before any training, validate the data:

- Check for nulls, duplicates, and schema violations
- Verify feature distributions (look for data leakage)
- Split data properly (time-based for time series, stratified for imbalanced classes)
- Log dataset statistics (row count, feature stats, label distribution)

### Step 4: Feature Engineering

Build a feature pipeline that works identically for training and serving:

- Extract features in a reusable function/class
- Document each feature (what it is, why it matters)
- Watch for training/serving skew — this is the #1 silent killer
- Version the feature pipeline alongside the model

### Step 5: Training Script

Implement the training script with:

- Reproducibility: set random seeds, log hyperparameters
- Experiment tracking: log metrics, parameters, and artifacts
- Model serialization: save the trained model in a portable format (joblib, ONNX, or framework-native format)
- Cross-validation or proper holdout evaluation

### Step 6: Evaluation

Evaluate against the success metric from Step 1:

- Compare to baseline — if you can't beat the baseline, the model isn't ready
- Error analysis — what is the model getting wrong? Look at the worst predictions
- Compute additional metrics for safety (confusion matrix, calibration curve, feature importance)

### Step 7: Serving Endpoint

Set up a serving endpoint:

- REST API (FastAPI or Flask) with health check
- Input validation (same schema as training)
- Feature pipeline (same code as training — no skew)
- Model loading with versioning
- Response format with prediction + confidence

### Step 8: Instrument and Monitor

Add logging for production:

- Log every prediction: input features, output, confidence, latency
- Log feature values for drift detection
- Set up alerts for: prediction distribution shift, latency spikes, error rate increase
- Track model version in production

Present a summary:

```
## ML Pipeline Built

**Model:** [type] | **Metric:** [value] vs [baseline]
**Serving:** [endpoint] | **Features:** [count]

### Files Created
- data_validation.py — input validation
- features.py — feature pipeline
- train.py — training script
- evaluate.py — evaluation
- serve.py — serving endpoint

### Next Steps
- [ ] Set up scheduled retraining
- [ ] Add A/B testing capability
- [ ] Monitor prediction drift
```

## Delivery

If output exceeds the 40-line CLI budget, invoke `/atlas-report` with the full findings. The HTML report is the output. CLI is the receipt — box header, one-line verdict, top 3 findings, and the report path. Never dump analysis to CLI.