Skip to main content
ClaudeWave
Skill6.6k repo starsupdated 16d ago

wiki-retrieve

**wiki-retrieve** is a chunk-level hybrid retrieval system for the Compound Vault that combines contextual prefixes, BM25 sparse search, and cosine reranking to replace the v1.6 page-level read order. Based on Anthropic's September 2024 Contextual Retrieval research, it reduces retrieval failure by 35–67% and is opt-in via setup script, with data privacy controls that allow fully on-machine operation using synthetic prefixes and local Ollama reranking.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/AgriciDaniel/claude-obsidian /tmp/wiki-retrieve && cp -r /tmp/wiki-retrieve/skills/wiki-retrieve ~/.claude/skills/wiki-retrieve
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# wiki-retrieve: Hybrid Retrieval over the Vault

The v1.6 query path was `Read(hot.md) → Read(index.md) → Read(3-5 pages) → synthesize`. It worked, but page-level granularity loses to chunk-level granularity any time the answer lives in a specific passage rather than a whole page. The v1.7 `wiki-retrieve` skill is the chunk-level upgrade — opt-in, feature-gated, and replaces nothing if you don't run the setup.

**Origin**: This skill is original to claude-obsidian. There is no upstream kepano equivalent. The technique is from [Anthropic's Sept 2024 Contextual Retrieval research](https://www.anthropic.com/news/contextual-retrieval) — we implement it as agent-skill plumbing.

---

## Data privacy (v1.7.1+)

Tier 1 (Anthropic API) and tier 2 (claude CLI subprocess) of the contextual-prefix generator send **wiki page bodies off-machine**. As of v1.7.1, both tiers are GATED behind explicit user consent at two layers:

- `scripts/contextual-prefix.py --allow-egress` (default off). Without the flag, `pick_prefix_tier()` returns `"synthetic"` regardless of `ANTHROPIC_API_KEY` or `claude` binary presence.
- `bin/setup-retrieve.sh` prompts before any non-synthetic Stage 1 run; default is abort.

To run fully on-machine (tier 3 synthetic prefix + local ollama rerank), use `bash bin/setup-retrieve.sh --no-llm`. This is also the effective behavior if you decline the consent prompt or omit `--allow-egress`.

The guard mirrors `scripts/tiling-check.py:351` `--allow-remote-ollama`. v1.6 vaults that never provisioned this skill see zero behavior change.

---

## Architecture

```
INGEST (one-time, then incremental):

  wiki/<page>.md
       │
       ▼
  scripts/contextual-prefix.py
       │   ├─ chunk on paragraph boundaries (~500 token target, 200 char overlap)
       │   ├─ generate 1-2 sentence prefix per chunk
       │   │     tier 1: ANTHROPIC_API_KEY → Anthropic API (Haiku, prompt-cached
       │   │                                 when body ≥ ~16 KB / Haiku 4.5 floor)
       │   │     tier 2: `claude` on PATH  → claude -p subprocess
       │   │     tier 3: synthetic         → frontmatter title + first paragraph
       │   └─ write .vault-meta/chunks/<address>/chunk-NNN.json
       │
       ▼
  scripts/bm25-index.py build
       └─ inverted index over chunks' contextualized_text → .vault-meta/bm25/index.json

QUERY:

  query string
       │
       ▼
  scripts/retrieve.py "<query>" --top 5
       ├─ bm25-index.py query "<query>" --top 20    (sparse candidate set)
       ├─ rerank.py "<query>" --candidates -        (dense rerank via ollama cosine)
       │     cosine(query_embedding, chunk_embedding)
       │     embeddings cached in .vault-meta/embed-cache.json keyed by body_hash
       └─ dedupe by page-address, return top-N candidates with absolute_path
       │
       ▼
  caller (wiki-query / autoresearch) reads the cited pages and synthesizes
```

---

## Feature gating

Other skills must detect this skill before using it. The canonical detection:

```bash
[ -x scripts/retrieve.py ] && [ -d .vault-meta/chunks ] && \
  [ -f .vault-meta/bm25/index.json ] && \
  echo "wiki-retrieve installed" || echo "fallback: legacy hot→index→drill"
```

If detection fails, callers MUST fall back to the v1.6 read order. This skill never breaks the base plugin.

---

## Setup

```bash
bash bin/setup-retrieve.sh
```

What it does, in order:
1. Sanity-checks the 4 scripts are present and executable.
2. Creates `.vault-meta/chunks/` and `.vault-meta/bm25/`.
3. Probes ollama at `http://127.0.0.1:11434` for `nomic-embed-text` (rerank prerequisite). Reports status; does not install.
4. Reports which contextual-prefix tier will be used (Anthropic API / claude CLI / synthetic).
5. Runs `contextual-prefix.py --all` to chunk + contextualize every wiki page.
6. Runs `bm25-index.py build`.
7. Smoke-tests `retrieve.py` against the query "wiki".

Flags:
- `--check` — diagnostics only, no provisioning.
- `--no-llm` — force tier-3 synthetic prefix (cheapest, zero LLM dependency).
- `--rebuild` — re-chunk every page even if body_hash matches.

---

## Cost ceiling

Per Anthropic's published research, contextual-prefix generation costs approximately **$12 per 1,000 documents** with Haiku + prompt caching. For a 100-page vault with ~3 chunks per page, that's ~$3.60 one-time, with incremental updates much cheaper (only changed pages re-process).

If you want to validate cost before running on a large vault:

```bash
bash bin/setup-retrieve.sh --no-llm   # provision with tier-3 synthetic prefix
# inspect retrieval quality manually; if insufficient, re-run without --no-llm
```

The `claude-cli` subprocess tier (no API key) is free in $ terms but slower (~3-10s per chunk depending on Haiku availability).

---

## Skill commands (recipe)

These are the commands wiki-query and autoresearch will execute when wiki-retrieve is feature-detected. Other skills should mirror this pattern.

### Standard retrieve
```bash
python3 scripts/retrieve.py "your question here" --top 5
```
Output: JSON with `candidates` array. Each candidate has `absolute_path` to the source page; caller reads that page (using the v1.7 transport selector) and synthesizes.

### BM25-only (skip rerank)
```bash
python3 scripts/retrieve.py "query" --top 5 --no-rerank
```
Faster (no ollama call); lower quality.

### Explain mode (debugging)
```bash
python3 scripts/retrieve.py "query" --top 5 --explain
```
Adds an `explain` block with per-stage diagnostics (BM25 candidate count, dedupe size, etc.).

### Direct BM25 inspection
```bash
python3 scripts/bm25-index.py query "query" --top 10
python3 scripts/bm25-index.py stats
```

### Rerank strategy probe
```bash
python3 scripts/rerank.py "query" --peek
```
Reports which strategy will run (cosine via ollama / no-op).

---

## Integration with wiki-query

After this skill is installed, `skills/wiki-query/SKILL.md` standard and deep modes will:

1. Read `wiki/hot.md` (always — quick context).
2. Call `python3 scripts/retrieve.py "<