Skill11.2k estrellas del repoactualizado 1mo ago

fine-tuning-serving-openpi

**fine-tuning-serving-openpi** enables end-to-end fine-tuning and deployment of Physical Intelligence's OpenPI robot models (pi0, pi0-fast, pi0.5) on manipulation tasks from the ALOHA, DROID, and LIBERO environments. Use this skill to adapt OpenPI models to custom robot datasets, convert between JAX and PyTorch backends, launch policy inference servers, or troubleshoot normalization statistics and GPU memory constraints during training and deployment.

Ver fuente Repositorio: AI-Research-SKILLs

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/Orchestra-Research/AI-Research-SKILLs /tmp/fine-tuning-serving-openpi && cp -r /tmp/fine-tuning-serving-openpi/18-multimodal/openpi ~/.claude/skills/fine-tuning-serving-openpi

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# OpenPI Fine-Tuning and Serving

End-to-end workflows for fine-tuning and serving Physical Intelligence's OpenPI models (pi0, pi0-fast, pi0.5) on robot manipulation tasks from the public `openpi` repository. Covers blank-machine setup, JAX training, PyTorch training, checkpoint conversion, and policy inference serving.

## Quick start

Clone the public repo, install the workspace, then serve a pretrained policy:

```bash
git clone --recurse-submodules https://github.com/Physical-Intelligence/openpi.git
cd openpi
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
uv run scripts/serve_policy.py --env DROID
```

```python
from openpi_client import websocket_client_policy

client = websocket_client_policy.WebsocketClientPolicy(host="localhost", port=8000)
result = client.infer(observation)
actions = result["actions"]  # numpy array of shape (chunk_size, action_dim)
```

## Core concepts

**Model family**: OpenPI implements three model variants from Physical Intelligence:

| Model | Architecture | Speed | Quality | Typical use |
|-------|-------------|-------|---------|-------------|
| pi0 | Flow-matching VLA | Baseline | Highest | Research, complex tasks |
| pi0-fast | Autoregressive action tokens | 2-5x faster | Good | Real-time control |
| pi0.5 | pi0 + improved vision encoder | Baseline | Best | Latest default |

**Key design choices**:
- **Dual backend**: JAX (primary, official training) and PyTorch (community, deployment-friendly)
- **Config-driven**: All training/serving parameters defined in `src/openpi/training/config.py`
- **Norm stats**: Every config requires precomputed normalization statistics before training
- **WebSocket serving**: Policy servers expose a WebSocket API for low-latency inference

**Training loop invariant**: After every config or dataset change, always re-run this cycle:
1. Compute norm stats → 2. Train → 3. Serve checkpoint → 4. Validate inference

## Compute requirements

| Task | GPU | VRAM | Notes |
|------|-----|------|-------|
| Serve pi0.5 (inference) | 1x A100/H100 | ~24 GB | Single GPU sufficient |
| Fine-tune pi0.5 (JAX) | 1x A100 80GB | ~60 GB | Use `fsdp_devices` for multi-GPU |
| Fine-tune pi0 (JAX) | 1x A100 80GB | ~40 GB | Smaller model footprint |
| Fine-tune (PyTorch DDP) | 1-8x A100 | ~40 GB/GPU | torchrun launcher |
| Compute norm stats | CPU or 1x GPU | ~8 GB | Fast, can run on login node |

## Workflow 0: Blank-machine setup

Copy this checklist and track progress:

```text
Setup Progress:
- [ ] Step 1: Clone the public openpi repo with submodules
- [ ] Step 2: Install uv and sync the workspace
- [ ] Step 3: Install the editable package
- [ ] Step 4: Verify core imports and serving entrypoint
```

**Step 1: Clone repo**

```bash
git clone --recurse-submodules https://github.com/Physical-Intelligence/openpi.git
cd openpi
```

If you already cloned without submodules:

```bash
git submodule update --init --recursive
```

**Step 2: Sync dependencies**

```bash
GIT_LFS_SKIP_SMUDGE=1 uv sync
```

**Step 3: Install editable package**

```bash
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
```

**Step 4: Verify installation**

```bash
uv run python -c "from openpi.training import config as _config; print(_config.get_config('pi05_droid').name)"
uv run scripts/serve_policy.py --help
```

## When to use vs alternatives

**Use this skill when:**
- Fine-tuning pi0, pi0-fast, or pi0.5 on LeRobot or RLDS datasets
- Serving OpenPI policies for ALOHA, DROID, or LIBERO evaluation
- Converting JAX checkpoints to PyTorch format
- Debugging OpenPI training issues (norm stats, memory, config)

**Use `fine-tuning-openvla-oft` instead when:**
- Fine-tuning OpenVLA with continuous action heads and LoRA
- Reproducing OpenVLA-OFT paper results on LIBERO or ALOHA

**Use `evaluating-cosmos-policy` instead when:**
- Evaluating NVIDIA Cosmos Policy on simulation benchmarks

---

## Workflow 1: JAX fine-tuning on LeRobot data

Copy this checklist and track progress:

```text
JAX Fine-Tuning Progress:
- [ ] Step 1: Select and copy closest training config
- [ ] Step 2: Update dataset mapping and base checkpoint
- [ ] Step 3: Compute normalization statistics
- [ ] Step 4: Launch JAX training
- [ ] Step 5: Serve checkpoint and run inference sanity check
```

**Step 1: Select config**

Copy the closest config from `src/openpi/training/config.py`:

| Config | Use case |
|--------|----------|
| `pi05_libero` | pi0.5 LIBERO fine-tuning |
| `pi0_libero` | pi0 full fine-tuning on LIBERO |
| `pi0_fast_libero` | pi0-fast on LIBERO |
| `pi0_aloha_pen_uncap` | ALOHA custom data |
| `pi05_droid_finetune` | Small custom DROID dataset (LeRobot format) |
| `pi05_full_droid_finetune` | Full DROID RLDS large-scale training |

**Step 2: Update dataset and transforms**

```python
# In src/openpi/training/config.py, modify your config:
TrainConfig(
    name="my_custom_config",
    model_type="pi05",
    data=LeRobotDataConfig(
        repo_id="your-org/your-dataset",
        # Adjust transforms to match your data format
    ),
    weight_loader=Pi05WeightLoader(),  # Match model type
)
```

Set `repo_id` for your dataset and ensure `weight_loader` matches the model type (pi0 vs pi0.5).

**Step 3: Compute normalization statistics**

```bash
uv run scripts/compute_norm_stats.py --config-name <config_name>
```

This must run before every training launch when config, dataset, or transforms change.

**Step 4: Launch JAX training**

```bash
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py <config_name> \
  --exp-name=<run_name> \
  --overwrite
```

For full DROID RLDS training, add the `rlds` dependency group:

```bash
uv run --group rlds scripts/compute_norm_stats.py \
  --config-name pi05_full_droid_finetune \
  --max-frames 10000000

XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run --group rlds scripts/train.py \
  pi05_full_droid_finetune \
  --exp-name=<run_name> --overwrite
```

**Step 5: Serve and validate**

```bash
uv run scripts/serve_policy.py policy:checkpoint \
  --p

Del mismo repositorio

autoresearchSkill

Orchestrates end-to-end autonomous AI research projects using a two-loop architecture. The inner loop runs rapid experiment iterations with clear optimization targets. The outer loop synthesizes results, identifies patterns, and steers research direction. Routes to domain-specific skills for execution, supports continuous agent operation via Claude Code /loop and OpenClaw heartbeat, and produces research presentations and papers. Use when starting a research project, running autonomous experiments, or managing a multi-hypothesis research effort.

implementing-llms-litgptSkill

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

mamba-architectureSkill

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

nanogptSkill

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

rwkv-architectureSkill

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

distributed-llm-pretraining-torchtitanSkill

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

huggingface-tokenizersSkill

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

sentencepieceSkill

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.