Skill95 repo starsupdated 3d ago

agent-wiki-ingest

agent-wiki-ingest orchestrates a complete pipeline for converting raw agent execution traces into a structured wiki in a single pass. It sequences summarization, guideline extraction, skill synthesis, and consolidation across multiple traces while maintaining idempotency and enabling parallel processing where safe. Use this when you have a batch of traces to transform into a unified wiki with proper cross-trajectory clustering and consolidation in the correct order.

View source Repository: altk-evolve

Install in Claude Code

Copy

git clone --depth 1 https://github.com/AgentToolkit/altk-evolve /tmp/agent-wiki-ingest && cp -r /tmp/agent-wiki-ingest/explorations/agent-wiki/skills/agent-wiki-ingest ~/.claude/skills/agent-wiki-ingest

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Agent Wiki — Ingest (end-to-end orchestrator)

## Overview

This is the **one-pass entry point** for turning a batch of raw trajectories
into a fully-built wiki. It orchestrates the rest of the `agent-wiki` family
in the right order so no pass is skipped — in particular the
cross-trajectory **consolidation** pass, which is easy to forget when each
skill is invoked by hand.

You — the driving agent — run this by **spawning one subagent per
(trace × pass)**, not by doing the work inline. That keeps your own context
small (you never load every trace's full JSON) and lets independent passes
run in parallel. Each subagent acts as the corresponding single-purpose
skill (`agent-wiki-summarize`, `-extract-guidelines`, `-synthesize-skill`,
`-consolidate-guidelines`); this skill only sequences them and passes the
per-trace adapter notes.

The pipeline:

```
0.  Convert    raw bob / claude traces → normalized analysis JSON   (skip if already normalized)
1.  Bootstrap  create wiki scaffold + seed catalog                  (skip if wiki exists)
1.5 Skip       drop traces whose summaries/<sid>.md already exists   [pre-flight — idempotency]
2.  Summarize  1 subagent / new-trace → summaries/<sid>.md          [PARALLEL]
3.  Extract    1 subagent / new-trace → guidelines/*.md (+tags)     [SEQUENTIAL]
4.  Synthesize 1 subagent / new-trace → skills/<slug>/ --archive-covered  [SEQUENTIAL]
5.  Consolidate 1 subagent over the whole corpus → cluster pages    [SINGLE — MANDATORY]
6.  Catalog    final bookkeeping → indexes, used-by, priority       [you run this directly]
```

**Idempotent by default.** Re-running on the same source dir reprocesses
nothing: Step 1.5 filters out every trace that already has a summary page,
so Steps 2–4 only touch genuinely new traces. The consolidate + catalog tail
always runs (it's cheap and self-idempotent). To force a redo of an already-
ingested trace, keep it in the list and pass `--rewrite` to its `render-*`
calls.

**Why this order.** `synthesize-skill` runs *before* `consolidate-guidelines`
so skills claim recipe-level territory first (and archive the atomics they
cover via `--archive-covered`); consolidation then clusters only the
*surviving* atomics. This matches the consolidate skill's own rule — "don't
propose clusters that overlap a skill's territory."

**Why parallel vs sequential.** Summarize writes one independent file per
trace (`summaries/<sid>.md`) → safe to parallelize. Extract and synthesize
both mutate shared state (`guidelines/_id_index.json`, `skills/_id_index.json`,
`_config.yaml`, and the `_archived/` moves) → run them **one trace at a
time** to avoid lost-update races.

## Input

One of:

- a list of trace file paths
- a directory of traces (the skill globs it)
- already-normalized analysis JSON files

…plus a target `--wiki-root` (e.g. `wiki-twobatch-skills`).

### Detecting trace shape (Step 0 dispatch)

Read the top-level JSON keys of each input to classify it:

| Shape | Signature | Conversion |
|---|---|---|
| **bob session JSON** | top-level `sessionId` + `messages` | `bob-trace-converter` |
| **claude stream-json** | JSONL lines with `{"type":"system"/"assistant"/"result"}` | `normalize_stream_json_transcripts.py` |
| **normalized analysis JSON** | top-level `model` + `messages` + `metadata.id` | pass through (no conversion) |

## Step 0 — Convert

Write converted output under a stable corpus dir:
`trajectories/normalized/<label>/items/`.

**bob session JSON:**
```bash
NODE_OPTIONS='' node ~/.claude/skills/bob-trace-converter/scripts/convert_bob_trace.mjs \
  <trace.json> --out-dir trajectories/normalized/<label>/items --format both
```
> The `NODE_OPTIONS=''` prefix is required — some shells inject a `--require`
> preload that breaks a bare `node` invocation. Strip it for this call.

The converter writes three files per trace; the ingest pipeline consumes the
`*-openai-chat-completions.analysis.json` one.

**claude stream-json:**
```bash
uv run python explorations/agent-wiki/experiments/harness/normalize_stream_json_transcripts.py \
  --in <transcripts-dir> --out trajectories/normalized \
  --label <label> --user-prompt "<the task prompt>"
```

**Already normalized:** skip — use the path as-is.

Collect the resulting list of analysis-JSON paths; this is the trace set the
rest of the pipeline iterates.

## Step 1 — Bootstrap the wiki

If `<wiki-root>/_index.jsonl` does **not** exist:

```bash
mkdir -p <wiki-root>/{summaries,guidelines,tasks,skills}
uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py \
  --wiki-root <wiki-root> catalog
```

The first `catalog` seeds `AGENTS.md` and `_config.yaml` from the bundled
defaults and writes empty indexes. Skip this whole step if the wiki already
exists — you're appending to it.

### Piping JSON to the helper — avoid `echo`

Every `render-*` subcommand reads JSON on stdin. The `echo '<json>' | …`
form in the per-pass skills **breaks when the payload has multi-line
`content`/`narrative` fields** (literal newlines become invalid control
characters in the shell-quoted string). Tell every subagent to write its
payload to a temp file and `cat` it instead:

```bash
cat /tmp/ingest-payload.json | uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py --wiki-root <wiki-root> render-guidelines
```

## Step 1.5 — Skip already-processed traces (pre-flight)

This is what makes re-running the skill on the same source dir cheap. The
helper's `render-*` subcommands skip-if-exists, but only *after* a subagent
has already read the trace and synthesized its output — so the LLM cost is
already spent. Filter **before** spawning any subagent.

For each normalized trace, read its `session_id` — it lives at
`metadata.id` (bob-converted analysis JSON) **or** top-level `session_id`
(claude-normalized). If `<wiki-root>/summaries/<sid>.md` already exists, the
trace was ingested on a prior run → drop it from the work-list. The
surviving **new-trace list** is what Steps 2–4 iterate.

Compute the