Skill2.5k repo starsupdated 6d ago

evals-context

The evals-context skill provides structural guidance for the Roo Code monorepo's evaluation system, distinguishing between the core execution infrastructure in packages/evals, the management UI at apps/web-evals, and the public results display at apps/web-roo-code/src/app/evals. Use this skill when tasks involve modifying eval exercises, debugging the evaluation execution system, working with the evals web interface, or understanding how eval-related code is organized across the repository.

View source Repository: Vibe-Skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/foryourhealth111-pixel/Vibe-Skills /tmp/evals-context && cp -r /tmp/evals-context/bundled/skills/evals-context ~/.claude/skills/evals-context

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Evals Codebase Context

## When to Use This Skill

Use this skill when the task involves:

- Modifying or debugging the evals execution infrastructure
- Adding new eval exercises or languages
- Working with the evals web interface (apps/web-evals)
- Modifying the public evals display page on roocode.com
- Understanding where evals code lives in this monorepo

## When NOT to Use This Skill

Do NOT use this skill when:

- Working on unrelated parts of the codebase (extension, webview-ui, etc.)
- The task is purely about the VS Code extension's core functionality
- Working on the main website pages that don't involve evals

## Key Disambiguation: Two "Evals" Locations

This monorepo has **two distinct evals-related locations** that can cause confusion:

| Component                   | Path                                                           | Purpose                                                        |
| --------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- |
| **Evals Execution System**  | `packages/evals/`                                              | Core eval infrastructure: CLI, DB schema, Docker configs       |
| **Evals Management UI**     | `apps/web-evals/`                                              | Next.js app for creating/monitoring eval runs (localhost:3446) |
| **Website Evals Page**      | `apps/web-roo-code/src/app/evals/`                             | Public roocode.com page displaying eval results                |
| **External Exercises Repo** | [Roo-Code-Evals](https://github.com/RooCodeInc/Roo-Code-Evals) | Actual coding exercises (NOT in this monorepo)                 |

## Directory Structure Reference

### `packages/evals/` - Core Evals Package

```
packages/evals/
├── ARCHITECTURE.md          # Detailed architecture documentation
├── ADDING-EVALS.md          # Guide for adding new exercises/languages
├── README.md                # Setup and running instructions
├── docker-compose.yml       # Container orchestration
├── Dockerfile.runner        # Runner container definition
├── Dockerfile.web           # Web app container
├── drizzle.config.ts        # Database ORM config
├── src/
│   ├── index.ts             # Package exports
│   ├── cli/                 # CLI commands for running evals
│   │   ├── runEvals.ts      # Orchestrates complete eval runs
│   │   ├── runTask.ts       # Executes individual tasks in containers
│   │   ├── runUnitTest.ts   # Validates task completion via tests
│   │   └── redis.ts         # Redis pub/sub integration
│   ├── db/
│   │   ├── schema.ts        # Database schema (runs, tasks)
│   │   ├── queries/         # Database query functions
│   │   └── migrations/      # SQL migrations
│   └── exercises/
│       └── index.ts         # Exercise loading utilities
└── scripts/
    └── setup.sh             # Local macOS setup script
```

### `apps/web-evals/` - Evals Management Web App

```
apps/web-evals/
├── src/
│   ├── app/
│   │   ├── page.tsx         # Home page (runs list)
│   │   ├── runs/
│   │   │   ├── new/         # Create new eval run
│   │   │   └── [id]/        # View specific run status
│   │   └── api/runs/        # SSE streaming endpoint
│   ├── actions/             # Server actions
│   │   ├── runs.ts          # Run CRUD operations
│   │   ├── tasks.ts         # Task queries
│   │   ├── exercises.ts     # Exercise listing
│   │   └── heartbeat.ts     # Controller health checks
│   ├── hooks/               # React hooks (SSE, models, etc.)
│   └── lib/                 # Utilities and schemas
```

### `apps/web-roo-code/src/app/evals/` - Public Website Evals Page

```
apps/web-roo-code/src/app/evals/
├── page.tsx      # Fetches and displays public eval results
├── evals.tsx     # Main evals display component
├── plot.tsx      # Visualization component
└── types.ts      # EvalRun type (extends packages/evals types)
```

This page **displays** eval results on the public roocode.com website. It imports types from `@roo-code/evals` but does NOT run evals.

## Architecture Overview

The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:

```
┌─────────────────────────────────────────────────────────────┐
│  Web App (apps/web-evals)  ──────────────────────────────── │
│        │                                                    │
│        ▼                                                    │
│  PostgreSQL ◄────► Controller Container                     │
│        │               │                                    │
│        ▼               ▼                                    │
│     Redis ◄───► Runner Containers (1-25 parallel)           │
└─────────────────────────────────────────────────────────────┘
```

**Key components:**

- **Controller**: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
- **Runner**: Isolated Docker container with VS Code + Roo Code extension + language runtimes
- **Redis**: Pub/sub for real-time events (NOT task queuing)
- **PostgreSQL**: Stores runs, tasks, metrics

## Common Tasks Quick Reference

### Adding a New Eval Exercise

1. Add exercise to [Roo-Code-Evals](https://github.com/RooCodeInc/Roo-Code-Evals) repo (external)
2. See [`packages/evals/ADDING-EVALS.md`](packages/evals/ADDING-EVALS.md) for structure

### Modifying Eval CLI Behavior

Edit files in [`packages/evals/src/cli/`](packages/evals/src/cli/):

- [`runEvals.ts`](packages/evals/src/cli/runEvals.ts) - Run orchestration
- [`runTask.ts`](packages/evals/src/cli/runTask.ts) - Task execution
- [`runUnitTest.ts`](packages/evals/src/cli/runUnitTest.ts) - Test validation

### Modifying the Evals Web Interface

Edit files in [`apps/web-evals/src/`](apps/web-evals/src/):

- [`app/runs/new/new-run.tsx`](apps/web-evals/src/app/runs/new/new-run.tsx) - New run form
- [`actions/runs.ts`](apps/web-evals/src/actions/runs.ts) - Run server actions

### Modifyi

More from this repository

vibeSkill

Vibe Code Orchestrator (VCO) is a governed runtime entry that freezes requirements, plans XL-first execution, and enforces verification and phase cleanup.

skill-creatorSkill

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Codex's capabilities with specialized knowledge, workflows, or tool integrations.

skill-installerSkill

Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos).

LQF_Machine_Learning_Expert_GuideSkill

adaptyvSkill

Cloud laboratory platform for automated protein testing and validation. Use when designing proteins and needing experimental validation including binding assays, expression testing, thermostability measurements, enzyme activity assays, or protein sequence optimization. Also use for submitting experiments via API, tracking experiment status, downloading results, optimizing protein sequences for better expression using computational tools (NetSolP, SoluProt, SolubleMPNN, ESM), or managing protein design workflows with wet-lab validation.

aeonSkill

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

algorithmic-artSkill

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.

alpha-vantageSkill

Access real-time and historical stock market data, forex rates, cryptocurrency prices, commodities, economic indicators, and 50+ technical indicators via the Alpha Vantage API. Use when fetching stock prices (OHLCV), company fundamentals (income statement, balance sheet, cash flow), earnings, options data, market news/sentiment, insider transactions, GDP, CPI, treasury yields, gold/silver/oil prices, Bitcoin/crypto prices, forex exchange rates, or calculating technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands). Requires a free API key from alphavantage.co.