Skill1.3k repo starsupdated today

autocontext

Autocontext is a control plane for measuring and analyzing Hermes agent behavior that runs scenarios, inspects curator state, exports training data, and prepares datasets for local MLX or CUDA training through the autoctx CLI. Use it when you need to evaluate agent performance, replay runs, audit skill curation decisions, generate machine-readable status reports, or prepare data for model training without directly modifying Hermes skills.

View source Repository: autocontext

Install in Claude Code

Copy

git clone --depth 1 https://github.com/greyhaven-ai/autocontext /tmp/autocontext && cp -r /tmp/autocontext/skills/autocontext ~/.claude/skills/autocontext

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Autocontext

## Overview

Autocontext is a control plane for evaluating agent behavior, preserving useful run artifacts, exporting training data, and distilling stable behavior into local runtimes. In Hermes, use this skill when the work calls for measurement, replay, datasets, local MLX/CUDA training, or read-only analysis of Hermes skill curation.

Hermes Curator owns Hermes skill mutation. Autocontext should inspect, evaluate, replay, export, and recommend. Do not use Autocontext as a replacement for Hermes Curator, and do not edit Hermes skills directly unless the user explicitly asks for that operation.

## When to Use

- You need to run an Autocontext scenario from Hermes and inspect the result.
- You need machine-readable status for runs, solved knowledge, or training jobs.
- You need to inspect Hermes v0.12 Curator reports, skill usage counters, pinned state, or skill provenance.
- You need to export Autocontext knowledge into a reusable package or skill-like artifact.
- You need to prepare data for local MLX or CUDA training.
- You need to decide whether MCP is useful in a configured environment.

Do not use this skill for normal Hermes memory updates, direct skill consolidation, or user-local skill deletion. Those are Hermes Curator responsibilities.

## Integration Surface Order

Use the CLI first. The `autoctx` CLI is the default surface because Hermes agents can run it with normal terminal tools, see stdout and stderr, preserve logs, and debug failures without special host configuration.

MCP is optional. Use MCP when the environment already has Autocontext MCP configured and the task benefits from typed schemas, constrained invocation, or tool discovery. Do not require MCP just to wrap a command that the CLI already exposes cleanly.

Use a native Hermes runtime or OpenAI-compatible gateway when Autocontext is calling Hermes as an agent provider. Use a Hermes plugin emitter only when the user specifically needs high-fidelity live traces beyond read-only import of existing Hermes artifacts.

## CLI Quick Start

From a checkout of Autocontext:

```bash
cd autocontext
uv run autoctx --help
```

Inspect Hermes skill and curator state without modifying Hermes:

```bash
uv run autoctx hermes inspect --json
```

For a custom profile or test fixture:

```bash
uv run autoctx hermes inspect --home "$HERMES_HOME" --json
```

Install or refresh this skill into a Hermes profile:

```bash
uv run autoctx hermes export-skill --output ~/.hermes/skills/autocontext/SKILL.md --json
```

If the file already exists and the user wants to replace it:

```bash
uv run autoctx hermes export-skill --output ~/.hermes/skills/autocontext/SKILL.md --force --json
```

## Running Autocontext From Hermes

Use `--json` whenever Hermes needs to parse the result.

```bash
RUN_ID="hermes_$(date +%s)"
uv run autoctx run --scenario grid_ctf --gens 3 --run-id "$RUN_ID" --json
uv run autoctx status "$RUN_ID" --json
uv run autoctx replay "$RUN_ID" --generation 1
```

For a plain-language task:

```bash
uv run autoctx solve --description "Improve the support-triage response policy." --gens 3 --json
```

For one-shot judgment or improvement:

```bash
uv run autoctx judge --task-prompt "..." --output "..." --rubric "..." --json
uv run autoctx improve --task-prompt "..." --rubric "..." --rounds 3 --json
```

## Hermes Runtime Configuration

When Autocontext should call a Hermes-served model through an OpenAI-compatible gateway:

```bash
export AUTOCONTEXT_AGENT_PROVIDER=openai-compatible
export AUTOCONTEXT_AGENT_BASE_URL=http://localhost:8080/v1
export AUTOCONTEXT_AGENT_API_KEY=no-key
export AUTOCONTEXT_AGENT_DEFAULT_MODEL=hermes-3-llama-3.1-8b
uv run autoctx solve --description "..." --gens 3 --json
```

Keep provider configuration outside the skill when possible. The user or profile should own secrets, base URLs, and model names.

## Working With Hermes Curator

Hermes v0.12 writes Curator reports under `~/.hermes/logs/curator/<timestamp>/run.json` and `REPORT.md`. It tracks skill usage in `~/.hermes/skills/.usage.json`, and protects bundled or hub-installed skills through `.bundled_manifest` and `.hub/lock.json`.

Use:

```bash
uv run autoctx hermes inspect --json
```

Read the output as an inventory:

- `agent_created_skill_count` means Curator-eligible user or agent skills.
- `bundled_skill_count` and `hub_skill_count` are upstream-owned skills and should not be pruned by Autocontext.
- `pinned_skill_count` identifies skills Curator and agents should not modify.
- `curator.latest.counts` summarizes the latest consolidation, pruning, and archive activity.

Autocontext can use these signals for reports, datasets, and recommendations. Hermes Curator remains the writer for Hermes skill lifecycle changes.

## Privacy Before Session and Trajectory Ingest

Curator decision reports are decision metadata and safe to import without redaction. Session and trajectory imports are different: they contain raw model prompts and responses, which may include secrets, tokens, or content the operator did not intend for external storage.

Before recommending or running `autoctx hermes ingest-sessions` or `autoctx hermes ingest-trajectories`, explain the privacy tradeoff: the importer is read-only against `~/.hermes`, but the output JSONL contains the same content unless redaction is applied. Default is `--redact standard` (Anthropic/OpenAI keys, bearer tokens, emails, IPs, env values, paths, high-risk file refs). `--redact strict` adds user-defined regexes. `--redact off` writes raw content and the importer surfaces an explicit opt-in marker. Sessions in particular live in a SQLite store: an unwarranted ingest creates a new copy of every prompt and response. Prefer `--dry-run` first when the operator is unsure of the blast radius.

## Training Path

For Autocontext-owned runs, export training data and train locally:

```bash
uv run autoctx export-training-data --scenario grid_ctf --all-runs --output training/grid_ctf.jsonl
uv run auto