Skip to main content
ClaudeWave
Skill92 repo starsupdated today

agent-wiki-ingest

Ingest one or more agent trajectories (raw bob/claude traces or normalized JSON) into an agent-wiki end-to-end — convert, summarize, extract guidelines, synthesize skills, consolidate into clusters, and catalog. Use when you have a batch of traces to turn into a wiki in one pass.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/AgentToolkit/altk-evolve /tmp/agent-wiki-ingest && cp -r /tmp/agent-wiki-ingest/explorations/agent-wiki/skills/agent-wiki-ingest ~/.claude/skills/agent-wiki-ingest
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Agent Wiki — Ingest (end-to-end orchestrator)

## Overview

This is the **one-pass entry point** for turning a batch of raw trajectories
into a fully-built wiki. It orchestrates the rest of the `agent-wiki` family
in the right order so no pass is skipped — in particular the
cross-trajectory **consolidation** pass, which is easy to forget when each
skill is invoked by hand.

You — the driving agent — run this by **spawning one subagent per
(trace × pass)**, not by doing the work inline. That keeps your own context
small (you never load every trace's full JSON) and lets independent passes
run in parallel. Each subagent acts as the corresponding single-purpose
skill (`agent-wiki-summarize`, `-extract-guidelines`, `-synthesize-skill`,
`-consolidate-guidelines`); this skill only sequences them and passes the
per-trace adapter notes.

The pipeline:

```
0.  Convert    raw bob / claude traces → normalized analysis JSON   (skip if already normalized)
1.  Bootstrap  create wiki scaffold + seed catalog                  (skip if wiki exists)
1.5 Skip       drop traces whose summaries/<sid>.md already exists   [pre-flight — idempotency]
2.  Summarize  1 subagent / new-trace → summaries/<sid>.md          [PARALLEL]
3.  Extract    1 subagent / new-trace → guidelines/*.md (+tags)     [SEQUENTIAL]
4.  Synthesize 1 subagent / new-trace → skills/<slug>/ --archive-covered  [SEQUENTIAL]
5.  Consolidate 1 subagent over the whole corpus → cluster pages    [SINGLE — MANDATORY]
6.  Catalog    final bookkeeping → indexes, used-by, priority       [you run this directly]
```

**Idempotent by default.** Re-running on the same source dir reprocesses
nothing: Step 1.5 filters out every trace that already has a summary page,
so Steps 2–4 only touch genuinely new traces. The consolidate + catalog tail
always runs (it's cheap and self-idempotent). To force a redo of an already-
ingested trace, keep it in the list and pass `--rewrite` to its `render-*`
calls.

**Why this order.** `synthesize-skill` runs *before* `consolidate-guidelines`
so skills claim recipe-level territory first (and archive the atomics they
cover via `--archive-covered`); consolidation then clusters only the
*surviving* atomics. This matches the consolidate skill's own rule — "don't
propose clusters that overlap a skill's territory."

**Why parallel vs sequential.** Summarize writes one independent file per
trace (`summaries/<sid>.md`) → safe to parallelize. Extract and synthesize
both mutate shared state (`guidelines/_id_index.json`, `skills/_id_index.json`,
`_config.yaml`, and the `_archived/` moves) → run them **one trace at a
time** to avoid lost-update races.

## Input

One of:

- a list of trace file paths
- a directory of traces (the skill globs it)
- already-normalized analysis JSON files

…plus a target `--wiki-root` (e.g. `wiki-twobatch-skills`).

### Detecting trace shape (Step 0 dispatch)

Read the top-level JSON keys of each input to classify it:

| Shape | Signature | Conversion |
|---|---|---|
| **bob session JSON** | top-level `sessionId` + `messages` | `bob-trace-converter` |
| **claude stream-json** | JSONL lines with `{"type":"system"/"assistant"/"result"}` | `normalize_stream_json_transcripts.py` |
| **normalized analysis JSON** | top-level `model` + `messages` + `metadata.id` | pass through (no conversion) |

## Step 0 — Convert

Write converted output under a stable corpus dir:
`trajectories/normalized/<label>/items/`.

**bob session JSON:**
```bash
NODE_OPTIONS='' node ~/.claude/skills/bob-trace-converter/scripts/convert_bob_trace.mjs \
  <trace.json> --out-dir trajectories/normalized/<label>/items --format both
```
> The `NODE_OPTIONS=''` prefix is required — some shells inject a `--require`
> preload that breaks a bare `node` invocation. Strip it for this call.

The converter writes three files per trace; the ingest pipeline consumes the
`*-openai-chat-completions.analysis.json` one.

**claude stream-json:**
```bash
uv run python explorations/agent-wiki/experiments/harness/normalize_stream_json_transcripts.py \
  --in <transcripts-dir> --out trajectories/normalized \
  --label <label> --user-prompt "<the task prompt>"
```

**Already normalized:** skip — use the path as-is.

Collect the resulting list of analysis-JSON paths; this is the trace set the
rest of the pipeline iterates.

## Step 1 — Bootstrap the wiki

If `<wiki-root>/_index.jsonl` does **not** exist:

```bash
mkdir -p <wiki-root>/{summaries,guidelines,tasks,skills}
uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py \
  --wiki-root <wiki-root> catalog
```

The first `catalog` seeds `AGENTS.md` and `_config.yaml` from the bundled
defaults and writes empty indexes. Skip this whole step if the wiki already
exists — you're appending to it.

### Piping JSON to the helper — avoid `echo`

Every `render-*` subcommand reads JSON on stdin. The `echo '<json>' | …`
form in the per-pass skills **breaks when the payload has multi-line
`content`/`narrative` fields** (literal newlines become invalid control
characters in the shell-quoted string). Tell every subagent to write its
payload to a temp file and `cat` it instead:

```bash
cat /tmp/ingest-payload.json | uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py --wiki-root <wiki-root> render-guidelines
```

## Step 1.5 — Skip already-processed traces (pre-flight)

This is what makes re-running the skill on the same source dir cheap. The
helper's `render-*` subcommands skip-if-exists, but only *after* a subagent
has already read the trace and synthesized its output — so the LLM cost is
already spent. Filter **before** spawning any subagent.

For each normalized trace, read its `session_id` — it lives at
`metadata.id` (bob-converted analysis JSON) **or** top-level `session_id`
(claude-normalized). If `<wiki-root>/summaries/<sid>.md` already exists, the
trace was ingested on a prior run → drop it from the work-list. The
surviving **new-trace list** is what Steps 2–4 iterate.

Compute the
agent-wiki-consolidate-guidelinesSkill

Read all atomic guidelines in wiki-twobatch/guidelines/ and propose themed clusters that group near-duplicates. Writes cluster pages and updates _config.yaml; originals are preserved with a `superseded_by:` backref.

agent-wiki-consultSkill

Consult an agent-wiki for guidelines relevant to the task at hand. The wiki itself documents how to retrieve from it (AGENTS.md). Use this skill once you know what task or sub-task you're about to do — not at session start.

agent-wiki-extract-guidelinesSkill

Read a normalized Claude Code trajectory JSON and extract reusable guidelines into wiki-twobatch/guidelines/. Use when mining saved trajectories for reusable lessons.

agent-wiki-summarizeSkill

Read a normalized Claude Code trajectory JSON and write an episodic summary page to wiki-twobatch/summaries/. Use when summarizing one or more saved trajectories into the agent wiki.

agent-wiki-synthesize-skillSkill

Read a normalized Claude Code trajectory JSON and produce a wiki-resident SKILL.md page that future agents can invoke. Use when a trajectory captured a non-trivial successful workflow worth promoting from a free-text guideline to an executable, callable artifact.

agent-wiki-tasksSkill

Discover task families across summaries and write per-family comparison pages with findings narrative. Updates wiki-twobatch/_config.yaml task definitions and writes tasks/<slug>__task.md.

evolve-lite:learnSkill

Must be used near the end of any non-trivial turn that produced potentially reusable tools, guidance, errors, workarounds, or workflows, so those lessons are saved for future turns.

evolve-lite:provenanceSkill

Analyze saved trajectories and recall audit events offline to record whether recalled guidelines influenced completed sessions.