mlflow
MLflow is a framework-agnostic platform for managing the complete machine learning lifecycle, enabling users to track experiments with parameters and metrics, manage versioned models in a registry, deploy models to production environments, and reproduce results consistently. Use this skill when conducting ML research, comparing model variations, collaborating on projects requiring experiment reproducibility, or integrating experiment tracking across teams using any ML framework like scikit-learn, TensorFlow, or PyTorch.
git clone --depth 1 https://github.com/Orchestra-Research/AI-Research-SKILLs /tmp/mlflow && cp -r /tmp/mlflow/13-mlops/mlflow ~/.claude/skills/mlflowSKILL.md
# MLflow: ML Lifecycle Management Platform
## When to Use This Skill
Use MLflow when you need to:
- **Track ML experiments** with parameters, metrics, and artifacts
- **Manage model registry** with versioning and stage transitions
- **Deploy models** to various platforms (local, cloud, serving)
- **Reproduce experiments** with project configurations
- **Compare model versions** and performance metrics
- **Collaborate** on ML projects with team workflows
- **Integrate** with any ML framework (framework-agnostic)
**Users**: 20,000+ organizations | **GitHub Stars**: 23k+ | **License**: Apache 2.0
## Installation
```bash
# Install MLflow
pip install mlflow
# Install with extras
pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.
# Start MLflow UI
mlflow ui
# Access at http://localhost:5000
```
## Quick Start
### Basic Tracking
```python
import mlflow
# Start a run
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()
# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)
# Log model
mlflow.sklearn.log_model(model, "model")
```
### Autologging (Automatic Tracking)
```python
import mlflow
from sklearn.ensemble import RandomForestClassifier
# Enable autologging
mlflow.autolog()
# Train (automatically logged)
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
# Metrics, parameters, and model logged automatically!
```
## Core Concepts
### 1. Experiments and Runs
**Experiment**: Logical container for related runs
**Run**: Single execution of ML code (parameters, metrics, artifacts)
```python
import mlflow
# Create/set experiment
mlflow.set_experiment("my-experiment")
# Start a run
with mlflow.start_run(run_name="baseline-model"):
# Log params
mlflow.log_param("model", "ResNet50")
mlflow.log_param("epochs", 10)
# Train
model = train()
# Log metrics
mlflow.log_metric("accuracy", 0.95)
# Log model
mlflow.pytorch.log_model(model, "model")
# Run ID is automatically generated
print(f"Run ID: {mlflow.active_run().info.run_id}")
```
### 2. Logging Parameters
```python
with mlflow.start_run():
# Single parameter
mlflow.log_param("learning_rate", 0.001)
# Multiple parameters
mlflow.log_params({
"batch_size": 32,
"epochs": 50,
"optimizer": "Adam",
"dropout": 0.2
})
# Nested parameters (as dict)
config = {
"model": {
"architecture": "ResNet50",
"pretrained": True
},
"training": {
"lr": 0.001,
"weight_decay": 1e-4
}
}
# Log as JSON string or individual params
for key, value in config.items():
mlflow.log_param(key, str(value))
```
### 3. Logging Metrics
```python
with mlflow.start_run():
# Training loop
for epoch in range(NUM_EPOCHS):
train_loss = train_epoch()
val_loss = validate()
# Log metrics at each step
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
# Log multiple metrics
mlflow.log_metrics({
"train_accuracy": train_acc,
"val_accuracy": val_acc
}, step=epoch)
# Log final metrics (no step)
mlflow.log_metric("final_accuracy", final_acc)
```
### 4. Logging Artifacts
```python
with mlflow.start_run():
# Log file
model.save('model.pkl')
mlflow.log_artifact('model.pkl')
# Log directory
os.makedirs('plots', exist_ok=True)
plt.savefig('plots/loss_curve.png')
mlflow.log_artifacts('plots')
# Log text
with open('config.txt', 'w') as f:
f.write(str(config))
mlflow.log_artifact('config.txt')
# Log dict as JSON
mlflow.log_dict({'config': config}, 'config.json')
```
### 5. Logging Models
```python
# PyTorch
import mlflow.pytorch
with mlflow.start_run():
model = train_pytorch_model()
mlflow.pytorch.log_model(model, "model")
# Scikit-learn
import mlflow.sklearn
with mlflow.start_run():
model = train_sklearn_model()
mlflow.sklearn.log_model(model, "model")
# Keras/TensorFlow
import mlflow.keras
with mlflow.start_run():
model = train_keras_model()
mlflow.keras.log_model(model, "model")
# HuggingFace Transformers
import mlflow.transformers
with mlflow.start_run():
mlflow.transformers.log_model(
transformers_model={
"model": model,
"tokenizer": tokenizer
},
artifact_path="model"
)
```
## Autologging
Automatically log metrics, parameters, and models for popular frameworks.
### Enable Autologging
```python
import mlflow
# Enable for all supported frameworks
mlflow.autolog()
# Or enable for specific framework
mlflow.sklearn.autolog()
mlflow.pytorch.autolog()
mlflow.keras.autolog()
mlflow.xgboost.autolog()
```
### Autologging with Scikit-learn
```python
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Enable autologging
mlflow.sklearn.autolog()
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train (automatically logs params, metrics, model)
with mlflow.start_run():
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration logged
```
### Autologging with PyTorch Lightning
```python
import mlflow
import pytorch_lightning as pl
# Enable autologging
mlflow.pytorch.autolog()
# Train
with mlflow.start_run():
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoinOrchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with clear optimization targets. The outer loop synthesizes results, identifies patterns, and steers research direction. Routes to domain-specific skills for execution, supports continuous agent operation via Claude Code /loop and OpenClaw heartbeat, and produces research presentations and papers. Use when starting a research project, running autonomous experiments, or managing a multi-hypothesis research effort.
Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.
Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).
RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.