Skip to main content
ClaudeWave
Skill532 repo starsupdated 2d ago

rag-dev

The rag-dev skill provides tools for building knowledge bases and running semantic search over private documents using Butterbase RAG. Use it when ingesting text or files into collections, polling ingestion status, performing similarity searches across embedded document chunks, and optionally synthesizing LLM-generated answers from retrieved content. The skill handles asynchronous document processing with pgvector embeddings and supports configurable access modes for controlling who can query each collection.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/butterbase-ai/butterbase-skills /tmp/rag-dev && cp -r /tmp/rag-dev/skills/rag-dev ~/.claude/skills/rag-dev
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Butterbase RAG (Retrieval-Augmented Generation)

Two tools cover the entire RAG surface:

- **`manage_rag_content`** — collections, document ingestion, status polling, deletion
- **`rag_query`** — semantic search, optional LLM synthesis

Documents are ingested asynchronously: text or files become embeddings stored in pgvector, and queries do a similarity search at runtime.

---

## 1. The mental model

```
Collection                        Documents                     Chunks
──────────                       ──────────                     ──────
"product-faq" ──────────────►   doc_1 (PDF) ───────────►       chunk 1, 2, 3...
                                doc_2 (text) ──────────►       chunk 4, 5...
                                doc_3 (markdown) ──────►       chunk 6...
```

A **collection** holds documents; a **document** is split into **chunks** and embedded; **`rag_query`** searches by cosine similarity across chunks within a collection.

`chunk_size` and `chunk_overlap` are set **once at collection creation** and immutable — to change them, delete and recreate the collection.

---

## 2. End-to-end workflow

```
┌────────────────────────────────────────────┐
│ 1. create_collection (once per knowledge)  │
├────────────────────────────────────────────┤
│ 2. ingest_document (text OR storage_object)│
├────────────────────────────────────────────┤
│ 3. poll get_document_status until "ready"  │
├────────────────────────────────────────────┤
│ 4. rag_query (with or without synthesis)   │
└────────────────────────────────────────────┘
```

### Step 1 — create the collection

```js
manage_rag_content({
  app_id: "app_abc123",
  action: "create_collection",
  name: "product-faq",
  description: "Customer-facing product knowledge",
  chunk_size: 512,         // optional, default 512 tokens
  chunk_overlap: 50,       // optional, default 50 tokens
  access_mode: "shared"    // optional: "private" | "shared" | "custom"
})
```

| `access_mode` | Who can query |
|----------------|---------------|
| `private` (default) | Only the app owner / service key |
| `shared` | Any authenticated end-user with a valid JWT |
| `custom` | Respects RLS policies — for fine-grained control |

### Step 2a — ingest raw text

```js
manage_rag_content({
  app_id: "app_abc123",
  action: "ingest_document",
  collection: "product-faq",
  text: "Our return policy is 30 days from purchase...",
  filename: "return-policy.txt",          // optional, for display
  metadata: { category: "returns", tier: "all" }   // filter later in rag_query
})
// → { document_id: "doc_xyz", status: "pending" }
```

### Step 2b — ingest an uploaded file

Files come from `manage_storage` first. Two-step:

```js
// 1. Upload the file via the storage skill — get an object_id
const { object_id } = await uploadPdfViaStorage(...);

// 2. Hand that object_id to RAG ingestion
manage_rag_content({
  app_id: "app_abc123",
  action: "ingest_document",
  collection: "product-faq",
  storage_object_id: object_id,
  filename: "manual.pdf",
  metadata: { product: "v3" }
})
```

Supported file types: **PDF, TXT, Markdown, CSV, HTML, DOCX, XLSX, PPTX**.

### Step 3 — poll until ready

Ingestion is fire-and-forget. The document moves through `pending → processing → ready` (or `failed`). Poll:

```js
manage_rag_content({
  app_id: "app_abc123",
  action: "get_document_status",
  collection: "product-faq",
  document_id: "doc_xyz"
})
// → { id, filename, status: "processing", processedAt, errorMessage? }
```

Recommended cadence: poll every 2–5 seconds for the first minute, back off after that. Bigger files (large PDFs, XLSX) take longer.

### Step 4 — query

Two modes: raw retrieval (just chunks back) or synthesized (LLM answer + sources).

#### Raw retrieval

```js
rag_query({
  app_id: "app_abc123",
  collection: "product-faq",
  query: "How long do I have to return an item?",
  top_k: 5,                  // default 5, max 20
  threshold: 0.7,            // optional similarity floor (0..1)
  filter: { category: "returns" }    // optional metadata filter
})
// → { chunks: [{ text, score, document_id, metadata }, ...] }
```

#### Synthesized answer

```js
rag_query({
  app_id: "app_abc123",
  collection: "product-faq",
  query: "How long do I have to return an item?",
  synthesize: true,
  model: "anthropic/claude-haiku-4.5"   // default
})
// → { answer, chunks, model }
```

`synthesize: true` runs the retrieved chunks through an LLM and returns a grounded answer. `chunks` is still included so you can show citations.

---

## 3. Listing and cleanup

```js
manage_rag_content({ app_id, action: "list_collections" })
manage_rag_content({ app_id, action: "get_collection", name: "product-faq" })
manage_rag_content({ app_id, action: "list_documents", collection: "product-faq" })
manage_rag_content({ app_id, action: "delete_document", collection: "product-faq", document_id: "doc_xyz" })
manage_rag_content({ app_id, action: "delete_collection", name: "product-faq" })
```

`get_collection` returns `{ name, description, accessMode, chunkSize, chunkOverlap, createdAt, documentCount: { pending, processing, ready, failed } }` — handy for a dashboard view.

> **Both `delete_document` and `delete_collection` are irreversible** and remove embeddings. To replace a document, delete then re-ingest.

---

## 4. Choosing chunk size and overlap

| Use case | Suggested `chunk_size` | `chunk_overlap` |
|----------|------------------------|------------------|
| Q&A over short FAQs / docs | 256–512 | 50 |
| Long-form documentation, manuals | 512–1024 | 100 |
| Code or structured content | 1024–2048 | 0–50 |
| Conversational logs / transcripts | 256 | 50 |

Larger chunks preserve more context but reduce retrieval granularity (you may pull in irrelevant nearby content). Overlap prevents semantic splits at boundaries from losing meaning. **You can't change these without recreating the collection** — pick them deliberately the first time.

---

## 5. Metadata-driven filtering

Anything