wiki-retrieve
**wiki-retrieve** is a chunk-level hybrid retrieval system for the Compound Vault that combines contextual prefixes, BM25 sparse search, and cosine reranking to replace the v1.6 page-level read order. Based on Anthropic's September 2024 Contextual Retrieval research, it reduces retrieval failure by 35–67% and is opt-in via setup script, with data privacy controls that allow fully on-machine operation using synthetic prefixes and local Ollama reranking.
git clone --depth 1 https://github.com/AgriciDaniel/claude-obsidian /tmp/wiki-retrieve && cp -r /tmp/wiki-retrieve/skills/wiki-retrieve ~/.claude/skills/wiki-retrieveSKILL.md
# wiki-retrieve: Hybrid Retrieval over the Vault
The v1.6 query path was `Read(hot.md) → Read(index.md) → Read(3-5 pages) → synthesize`. It worked, but page-level granularity loses to chunk-level granularity any time the answer lives in a specific passage rather than a whole page. The v1.7 `wiki-retrieve` skill is the chunk-level upgrade — opt-in, feature-gated, and replaces nothing if you don't run the setup.
**Origin**: This skill is original to claude-obsidian. There is no upstream kepano equivalent. The technique is from [Anthropic's Sept 2024 Contextual Retrieval research](https://www.anthropic.com/news/contextual-retrieval) — we implement it as agent-skill plumbing.
---
## Data privacy (v1.7.1+)
Tier 1 (Anthropic API) and tier 2 (claude CLI subprocess) of the contextual-prefix generator send **wiki page bodies off-machine**. As of v1.7.1, both tiers are GATED behind explicit user consent at two layers:
- `scripts/contextual-prefix.py --allow-egress` (default off). Without the flag, `pick_prefix_tier()` returns `"synthetic"` regardless of `ANTHROPIC_API_KEY` or `claude` binary presence.
- `bin/setup-retrieve.sh` prompts before any non-synthetic Stage 1 run; default is abort.
To run fully on-machine (tier 3 synthetic prefix + local ollama rerank), use `bash bin/setup-retrieve.sh --no-llm`. This is also the effective behavior if you decline the consent prompt or omit `--allow-egress`.
The guard mirrors `scripts/tiling-check.py:351` `--allow-remote-ollama`. v1.6 vaults that never provisioned this skill see zero behavior change.
---
## Architecture
```
INGEST (one-time, then incremental):
wiki/<page>.md
│
▼
scripts/contextual-prefix.py
│ ├─ chunk on paragraph boundaries (~500 token target, 200 char overlap)
│ ├─ generate 1-2 sentence prefix per chunk
│ │ tier 1: ANTHROPIC_API_KEY → Anthropic API (Haiku, prompt-cached
│ │ when body ≥ ~16 KB / Haiku 4.5 floor)
│ │ tier 2: `claude` on PATH → claude -p subprocess
│ │ tier 3: synthetic → frontmatter title + first paragraph
│ └─ write .vault-meta/chunks/<address>/chunk-NNN.json
│
▼
scripts/bm25-index.py build
└─ inverted index over chunks' contextualized_text → .vault-meta/bm25/index.json
QUERY:
query string
│
▼
scripts/retrieve.py "<query>" --top 5
├─ bm25-index.py query "<query>" --top 20 (sparse candidate set)
├─ rerank.py "<query>" --candidates - (dense rerank via ollama cosine)
│ cosine(query_embedding, chunk_embedding)
│ embeddings cached in .vault-meta/embed-cache.json keyed by body_hash
└─ dedupe by page-address, return top-N candidates with absolute_path
│
▼
caller (wiki-query / autoresearch) reads the cited pages and synthesizes
```
---
## Feature gating
Other skills must detect this skill before using it. The canonical detection:
```bash
[ -x scripts/retrieve.py ] && [ -d .vault-meta/chunks ] && \
[ -f .vault-meta/bm25/index.json ] && \
echo "wiki-retrieve installed" || echo "fallback: legacy hot→index→drill"
```
If detection fails, callers MUST fall back to the v1.6 read order. This skill never breaks the base plugin.
---
## Setup
```bash
bash bin/setup-retrieve.sh
```
What it does, in order:
1. Sanity-checks the 4 scripts are present and executable.
2. Creates `.vault-meta/chunks/` and `.vault-meta/bm25/`.
3. Probes ollama at `http://127.0.0.1:11434` for `nomic-embed-text` (rerank prerequisite). Reports status; does not install.
4. Reports which contextual-prefix tier will be used (Anthropic API / claude CLI / synthetic).
5. Runs `contextual-prefix.py --all` to chunk + contextualize every wiki page.
6. Runs `bm25-index.py build`.
7. Smoke-tests `retrieve.py` against the query "wiki".
Flags:
- `--check` — diagnostics only, no provisioning.
- `--no-llm` — force tier-3 synthetic prefix (cheapest, zero LLM dependency).
- `--rebuild` — re-chunk every page even if body_hash matches.
---
## Cost ceiling
Per Anthropic's published research, contextual-prefix generation costs approximately **$12 per 1,000 documents** with Haiku + prompt caching. For a 100-page vault with ~3 chunks per page, that's ~$3.60 one-time, with incremental updates much cheaper (only changed pages re-process).
If you want to validate cost before running on a large vault:
```bash
bash bin/setup-retrieve.sh --no-llm # provision with tier-3 synthetic prefix
# inspect retrieval quality manually; if insufficient, re-run without --no-llm
```
The `claude-cli` subprocess tier (no API key) is free in $ terms but slower (~3-10s per chunk depending on Haiku availability).
---
## Skill commands (recipe)
These are the commands wiki-query and autoresearch will execute when wiki-retrieve is feature-detected. Other skills should mirror this pattern.
### Standard retrieve
```bash
python3 scripts/retrieve.py "your question here" --top 5
```
Output: JSON with `candidates` array. Each candidate has `absolute_path` to the source page; caller reads that page (using the v1.7 transport selector) and synthesizes.
### BM25-only (skip rerank)
```bash
python3 scripts/retrieve.py "query" --top 5 --no-rerank
```
Faster (no ollama call); lower quality.
### Explain mode (debugging)
```bash
python3 scripts/retrieve.py "query" --top 5 --explain
```
Adds an `explain` block with per-stage diagnostics (BM25 candidate count, dedupe size, etc.).
### Direct BM25 inspection
```bash
python3 scripts/bm25-index.py query "query" --top 10
python3 scripts/bm25-index.py stats
```
### Rerank strategy probe
```bash
python3 scripts/rerank.py "query" --peek
```
Reports which strategy will run (cosine via ollama / no-op).
---
## Integration with wiki-query
After this skill is installed, `skills/wiki-query/SKILL.md` standard and deep modes will:
1. Read `wiki/hot.md` (always — quick context).
2. Call `python3 scripts/retrieve.py "<>
Ingest sources into the Obsidian wiki vault. Reads a source, extracts entities and concepts, creates or updates wiki pages, cross-references, and logs the operation. Supports files, URLs, and batch mode. Triggers on: ingest, process this source, add this to the wiki, read and file this, batch ingest, ingest all of these, ingest this url.
>
>
Visual layer of the wiki. Add images, text cards, PDFs, and wiki pages to Obsidian canvas files with auto-positioning inside zones. Integrates with /banana for image capture. Triggers on: /canvas, canvas new, canvas add image, canvas add text, canvas add pdf, canvas add note, canvas zone, canvas list, canvas from banana, add to canvas, put this on the canvas, open canvas, create canvas.
>
>
Strip clutter from web pages before ingesting into the wiki. Removes ads, navigation, headers, footers, and boilerplate: leaving clean readable markdown that saves 40-60% tokens. Triggers on: defuddle, clean this page, strip this url, fetch and clean, clean web content before ingesting, strip ads, remove clutter, clean URL content, readable markdown from URL.