experiment-tracking-swanlab
SwanLab is an open-source experiment tracking tool for machine learning workflows that enables local or self-hosted run tracking with lightweight media logging. Use it when you need to track metrics, configurations, and experiments without relying on cloud-based services, particularly for PyTorch, Transformers, PyTorch Lightning, or Fastai projects that require scalar visualization and multi-format media logging capabilities.
git clone --depth 1 https://github.com/Orchestra-Research/AI-Research-SKILLs /tmp/experiment-tracking-swanlab && cp -r /tmp/experiment-tracking-swanlab/13-mlops/swanlab ~/.claude/skills/experiment-tracking-swanlabSKILL.md
# SwanLab: Open-Source Experiment Tracking
## When to Use This Skill
Use SwanLab when you need to:
- **Track ML experiments** with metrics, configs, tags, and descriptions
- **Visualize training** with scalar charts and logged media
- **Compare runs** across seeds, checkpoints, and hyperparameters
- **Work locally or self-hosted** instead of depending on managed SaaS
- **Integrate** with PyTorch, Transformers, PyTorch Lightning, or Fastai
**Deployment**: Cloud, local, or self-hosted | **Media**: images, audio, text, GIFs, point clouds, molecules | **Integrations**: PyTorch, Transformers, PyTorch Lightning, Fastai
## Installation
```bash
# Install SwanLab plus the media dependencies used in this skill
pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"
# Add local dashboard support for mode="local" and swanlab watch
pip install "swanlab[dashboard]>=0.7.11"
# Optional framework integrations
pip install transformers pytorch-lightning fastai
# Login for cloud or self-hosted usage
swanlab login
```
`pillow` and `soundfile` are the media dependencies used by the Image and Audio examples in this skill. `swanlab[dashboard]` adds the local dashboard dependency required by `mode="local"` and `swanlab watch`.
## Quick Start
### Basic Experiment Tracking
```python
import swanlab
run = swanlab.init(
project="my-project",
experiment_name="baseline",
config={
"learning_rate": 1e-3,
"epochs": 10,
"batch_size": 32,
"model": "resnet18",
},
)
for epoch in range(run.config.epochs):
train_loss = train_epoch()
val_loss = validate()
swanlab.log(
{
"train/loss": train_loss,
"val/loss": val_loss,
"epoch": epoch,
}
)
run.finish()
```
### With PyTorch
```python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab
run = swanlab.init(
project="pytorch-demo",
experiment_name="mnist-mlp",
config={
"learning_rate": 1e-3,
"batch_size": 64,
"epochs": 10,
"hidden_size": 128,
},
)
model = nn.Sequential(
nn.Flatten(),
nn.Linear(28 * 28, run.config.hidden_size),
nn.ReLU(),
nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()
for epoch in range(run.config.epochs):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
logits = model(data)
loss = criterion(logits, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
swanlab.log(
{
"train/loss": loss.item(),
"train/epoch": epoch,
"train/batch": batch_idx,
}
)
run.finish()
```
## Core Concepts
### 1. Projects and Experiments
**Project**: Collection of related experiments
**Experiment**: Single execution of a training or evaluation workflow
```python
import swanlab
run = swanlab.init(
project="image-classification",
experiment_name="resnet18-seed42",
description="Baseline run on ImageNet subset",
tags=["baseline", "resnet18"],
config={
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
},
)
print(run.id)
print(run.config.learning_rate)
```
### 2. Configuration Tracking
```python
config = {
"model": "resnet18",
"seed": 42,
"batch_size": 64,
"learning_rate": 3e-4,
"epochs": 20,
}
run = swanlab.init(project="my-project", config=config)
learning_rate = run.config.learning_rate
batch_size = run.config.batch_size
```
### 3. Metric Logging
```python
# Log scalars
swanlab.log({"loss": 0.42, "accuracy": 0.91})
# Log multiple metrics
swanlab.log(
{
"train/loss": train_loss,
"train/accuracy": train_acc,
"val/loss": val_loss,
"val/accuracy": val_acc,
"lr": current_lr,
"epoch": epoch,
}
)
# Log with custom step
swanlab.log({"loss": loss}, step=global_step)
```
### 4. Media and Chart Logging
```python
import numpy as np
import swanlab
# Image
image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})
# Audio
wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32")
swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})
# Text
swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})
# GIF video
swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})
# Point cloud
points = np.random.rand(128, 3).astype("float32")
swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})
# Molecule
swanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})
```
```python
# Custom chart with swanlab.echarts
line = swanlab.echarts.Line()
line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"])
line.add_yaxis("train/loss", [0.92, 0.61, 0.44])
line.set_global_opts(
title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss")
)
swanlab.log({"charts/loss_curve": line})
```
See [references/visualization.md](references/visualization.md) for more chart and media patterns.
### 5. Local and Self-Hosted Workflows
```python
import os
import swanlab
# Self-hosted or cloud login
swanlab.login(
api_key=os.environ["SWANLAB_API_KEY"],
host="http://your-server:5092",
)
# Local-only logging
run = swanlab.init(
project="offline-demo",
mode="local",
logdir="./swanlog",
)
swanlab.log({"loss": 0.35, "epoch": 1})
run.finish()
```
```bash
# View local logs
swanlab watch -l ./swanlog
# Sync local logs later
swanlab sync ./swanlog
```
## Integration Examples
### HuggingFace Transformers
```python
from transformers import Trainer, TrainingArgumOrchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with clear optimization targets. The outer loop synthesizes results, identifies patterns, and steers research direction. Routes to domain-specific skills for execution, supports continuous agent operation via Claude Code /loop and OpenClaw heartbeat, and produces research presentations and papers. Use when starting a research project, running autonomous experiments, or managing a multi-hypothesis research effort.
Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.
Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).
RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.