Skip to main content
ClaudeWave
Skill9.6k repo starsupdated 1mo ago

experiment-tracking-swanlab

SwanLab is an open-source experiment tracking tool for machine learning workflows that enables local or self-hosted run tracking with lightweight media logging. Use it when you need to track metrics, configurations, and experiments without relying on cloud-based services, particularly for PyTorch, Transformers, PyTorch Lightning, or Fastai projects that require scalar visualization and multi-format media logging capabilities.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/Orchestra-Research/AI-Research-SKILLs /tmp/experiment-tracking-swanlab && cp -r /tmp/experiment-tracking-swanlab/13-mlops/swanlab ~/.claude/skills/experiment-tracking-swanlab
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# SwanLab: Open-Source Experiment Tracking

## When to Use This Skill

Use SwanLab when you need to:
- **Track ML experiments** with metrics, configs, tags, and descriptions
- **Visualize training** with scalar charts and logged media
- **Compare runs** across seeds, checkpoints, and hyperparameters
- **Work locally or self-hosted** instead of depending on managed SaaS
- **Integrate** with PyTorch, Transformers, PyTorch Lightning, or Fastai

**Deployment**: Cloud, local, or self-hosted | **Media**: images, audio, text, GIFs, point clouds, molecules | **Integrations**: PyTorch, Transformers, PyTorch Lightning, Fastai

## Installation

```bash
# Install SwanLab plus the media dependencies used in this skill
pip install "swanlab>=0.7.11" "pillow>=9.0.0" "soundfile>=0.12.0"

# Add local dashboard support for mode="local" and swanlab watch
pip install "swanlab[dashboard]>=0.7.11"

# Optional framework integrations
pip install transformers pytorch-lightning fastai

# Login for cloud or self-hosted usage
swanlab login
```

`pillow` and `soundfile` are the media dependencies used by the Image and Audio examples in this skill. `swanlab[dashboard]` adds the local dashboard dependency required by `mode="local"` and `swanlab watch`.

## Quick Start

### Basic Experiment Tracking

```python
import swanlab

run = swanlab.init(
    project="my-project",
    experiment_name="baseline",
    config={
        "learning_rate": 1e-3,
        "epochs": 10,
        "batch_size": 32,
        "model": "resnet18",
    },
)

for epoch in range(run.config.epochs):
    train_loss = train_epoch()
    val_loss = validate()

    swanlab.log(
        {
            "train/loss": train_loss,
            "val/loss": val_loss,
            "epoch": epoch,
        }
    )

run.finish()
```

### With PyTorch

```python
import torch
import torch.nn as nn
import torch.optim as optim
import swanlab

run = swanlab.init(
    project="pytorch-demo",
    experiment_name="mnist-mlp",
    config={
        "learning_rate": 1e-3,
        "batch_size": 64,
        "epochs": 10,
        "hidden_size": 128,
    },
)

model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(28 * 28, run.config.hidden_size),
    nn.ReLU(),
    nn.Linear(run.config.hidden_size, 10),
)
optimizer = optim.Adam(model.parameters(), lr=run.config.learning_rate)
criterion = nn.CrossEntropyLoss()

for epoch in range(run.config.epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        logits = model(data)
        loss = criterion(logits, target)
        loss.backward()
        optimizer.step()

        if batch_idx % 100 == 0:
            swanlab.log(
                {
                    "train/loss": loss.item(),
                    "train/epoch": epoch,
                    "train/batch": batch_idx,
                }
            )

run.finish()
```

## Core Concepts

### 1. Projects and Experiments

**Project**: Collection of related experiments  
**Experiment**: Single execution of a training or evaluation workflow

```python
import swanlab

run = swanlab.init(
    project="image-classification",
    experiment_name="resnet18-seed42",
    description="Baseline run on ImageNet subset",
    tags=["baseline", "resnet18"],
    config={
        "model": "resnet18",
        "seed": 42,
        "batch_size": 64,
        "learning_rate": 3e-4,
    },
)

print(run.id)
print(run.config.learning_rate)
```

### 2. Configuration Tracking

```python
config = {
    "model": "resnet18",
    "seed": 42,
    "batch_size": 64,
    "learning_rate": 3e-4,
    "epochs": 20,
}

run = swanlab.init(project="my-project", config=config)

learning_rate = run.config.learning_rate
batch_size = run.config.batch_size
```

### 3. Metric Logging

```python
# Log scalars
swanlab.log({"loss": 0.42, "accuracy": 0.91})

# Log multiple metrics
swanlab.log(
    {
        "train/loss": train_loss,
        "train/accuracy": train_acc,
        "val/loss": val_loss,
        "val/accuracy": val_acc,
        "lr": current_lr,
        "epoch": epoch,
    }
)

# Log with custom step
swanlab.log({"loss": loss}, step=global_step)
```

### 4. Media and Chart Logging

```python
import numpy as np
import swanlab

# Image
image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
swanlab.log({"examples/image": swanlab.Image(image, caption="Augmented sample")})

# Audio
wave = np.sin(np.linspace(0, 8 * np.pi, 16000)).astype("float32")
swanlab.log({"examples/audio": swanlab.Audio(wave, sample_rate=16000)})

# Text
swanlab.log({"examples/text": swanlab.Text("Training notes for this run.")})

# GIF video
swanlab.log({"examples/video": swanlab.Video("predictions.gif", caption="Validation rollout")})

# Point cloud
points = np.random.rand(128, 3).astype("float32")
swanlab.log({"examples/point_cloud": swanlab.Object3D(points, caption="Point cloud sample")})

# Molecule
swanlab.log({"examples/molecule": swanlab.Molecule.from_smiles("CCO", caption="Ethanol")})
```

```python
# Custom chart with swanlab.echarts
line = swanlab.echarts.Line()
line.add_xaxis(["epoch-1", "epoch-2", "epoch-3"])
line.add_yaxis("train/loss", [0.92, 0.61, 0.44])
line.set_global_opts(
    title_opts=swanlab.echarts.options.TitleOpts(title="Training Loss")
)

swanlab.log({"charts/loss_curve": line})
```

See [references/visualization.md](references/visualization.md) for more chart and media patterns.

### 5. Local and Self-Hosted Workflows

```python
import os
import swanlab

# Self-hosted or cloud login
swanlab.login(
    api_key=os.environ["SWANLAB_API_KEY"],
    host="http://your-server:5092",
)

# Local-only logging
run = swanlab.init(
    project="offline-demo",
    mode="local",
    logdir="./swanlog",
)

swanlab.log({"loss": 0.35, "epoch": 1})
run.finish()
```

```bash
# View local logs
swanlab watch -l ./swanlog

# Sync local logs later
swanlab sync ./swanlog
```

## Integration Examples

### HuggingFace Transformers

```python
from transformers import Trainer, TrainingArgum
autoresearchSkill

Orchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with clear optimization targets. The outer loop synthesizes results, identifies patterns, and steers research direction. Routes to domain-specific skills for execution, supports continuous agent operation via Claude Code /loop and OpenClaw heartbeat, and produces research presentations and papers. Use when starting a research project, running autonomous experiments, or managing a multi-hypothesis research effort.

implementing-llms-litgptSkill

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

mamba-architectureSkill

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

nanogptSkill

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

rwkv-architectureSkill

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

distributed-llm-pretraining-torchtitanSkill

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

huggingface-tokenizersSkill

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

sentencepieceSkill

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.