Skip to main content
ClaudeWave
Skill188 repo starsupdated today

rag-retrieval

Retrieval-Augmented Generation patterns for grounded LLM responses. Use when building RAG pipelines, embedding documents, implementing hybrid search, contextual retrieval, HyDE, agentic RAG, multimodal RAG, query decomposition, reranking, or pgvector search.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/yonatangross/orchestkit /tmp/rag-retrieval && cp -r /tmp/rag-retrieval/plugins/ork/skills/rag-retrieval ~/.claude/skills/rag-retrieval
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# RAG Retrieval

Comprehensive patterns for building production RAG systems. Each category has individual rule files in `rules/` loaded on-demand.

## Quick Reference

| Category | Rules | Impact | When to Use |
|----------|-------|--------|-------------|
| [Core RAG](#core-rag) | 4 | CRITICAL | Basic RAG, citations, hybrid search, context management |
| [Embeddings](#embeddings) | 3 | HIGH | Model selection, chunking, batch/cache optimization |
| [Contextual Retrieval](#contextual-retrieval) | 3 | HIGH | Context-prepending, hybrid BM25+vector, pipeline |
| [HyDE](#hyde) | 3 | HIGH | Vocabulary mismatch, hypothetical document generation |
| [Agentic RAG](#agentic-rag) | 4 | HIGH | Self-RAG, CRAG, knowledge graphs, adaptive routing |
| [Multimodal RAG](#multimodal-rag) | 3 | MEDIUM | Image+text retrieval, PDF chunking, cross-modal search |
| [Query Decomposition](#query-decomposition) | 3 | MEDIUM | Multi-concept queries, parallel retrieval, RRF fusion |
| [Reranking](#reranking) | 3 | MEDIUM | Cross-encoder, LLM scoring, combined signals |
| [PGVector](#pgvector) | 4 | HIGH | PostgreSQL hybrid search, HNSW indexes, schema design |

**Total: 30 rules across 9 categories**

## Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

| Rule | File | Key Pattern |
|------|------|-------------|
| Basic RAG | `rules/core-basic-rag.md` | Retrieve + context + generate with citations |
| Hybrid Search | `rules/core-hybrid-search.md` | RRF fusion (k=60) for semantic + keyword |
| Context Management | `rules/core-context-management.md` | Token budgeting + sufficiency check |
| Pipeline Composition | `rules/core-pipeline-composition.md` | Composable Decompose → HyDE → Retrieve → Rerank |

## Embeddings

Embedding models, chunking strategies, and production optimization.

| Rule | File | Key Pattern |
|------|------|-------------|
| Models & API | `rules/embeddings-models.md` | Model selection, batch API, similarity |
| Chunking | `rules/embeddings-chunking.md` | Semantic boundary splitting, 512 token sweet spot |
| Advanced | `rules/embeddings-advanced.md` | Redis cache, Matryoshka dims, batch processing |

## Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

| Rule | File | Key Pattern |
|------|------|-------------|
| Context Prepending | `rules/contextual-prepend.md` | LLM-generated context + prompt caching |
| Hybrid Search | `rules/contextual-hybrid.md` | 40% BM25 / 60% vector weight split |
| Complete Pipeline | `rules/contextual-pipeline.md` | End-to-end indexing + hybrid retrieval |

## HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

| Rule | File | Key Pattern |
|------|------|-------------|
| Generation | `rules/hyde-generation.md` | Embed hypothetical doc, not query |
| Per-Concept | `rules/hyde-per-concept.md` | Parallel HyDE for multi-topic queries |
| Fallback | `rules/hyde-fallback.md` | 2-3s timeout → direct embedding fallback |

## Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

| Rule | File | Key Pattern |
|------|------|-------------|
| Self-RAG | `rules/agentic-self-rag.md` | Binary document grading for relevance |
| Corrective RAG | `rules/agentic-corrective-rag.md` | CRAG workflow with web fallback |
| Knowledge Graph | `rules/agentic-knowledge-graph.md` | KG + vector hybrid for entity-rich domains |
| Adaptive Retrieval | `rules/agentic-adaptive-retrieval.md` | Query routing to optimal strategy |

## Multimodal RAG

Image + text retrieval with cross-modal search.

| Rule | File | Key Pattern |
|------|------|-------------|
| Embeddings | `rules/multimodal-embeddings.md` | CLIP, SigLIP 2, Voyage multimodal-3 |
| Chunking | `rules/multimodal-chunking.md` | PDF extraction preserving images |
| Pipeline | `rules/multimodal-pipeline.md` | Dedup + hybrid retrieval + generation |

## Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

| Rule | File | Key Pattern |
|------|------|-------------|
| Detection | `rules/query-detection.md` | Heuristic indicators (<1ms fast path) |
| Decompose + RRF | `rules/query-decompose.md` | LLM concept extraction + parallel retrieval |
| HyDE Combo | `rules/query-hyde-combo.md` | Decompose + HyDE for maximum coverage |

## Reranking

Post-retrieval re-scoring for higher precision.

| Rule | File | Key Pattern |
|------|------|-------------|
| Cross-Encoder | `rules/reranking-cross-encoder.md` | ms-marco-MiniLM (~50ms, free) |
| LLM Reranking | `rules/reranking-llm.md` | Batch scoring + Cohere API |
| Combined | `rules/reranking-combined.md` | Multi-signal weighted scoring |

## PGVector

Production hybrid search with PostgreSQL.

| Rule | File | Key Pattern |
|------|------|-------------|
| Schema | `rules/pgvector-schema.md` | HNSW index + pre-computed tsvector |
| Hybrid Search | `rules/pgvector-hybrid-search.md` | SQLAlchemy RRF with FULL OUTER JOIN |
| Indexing | `rules/pgvector-indexing.md` | HNSW (17x faster) vs IVFFlat |
| Metadata | `rules/pgvector-metadata.md` | Filtering, boosting, Redis 8 comparison |

## Quick Start Example

```python
from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict:
    """Basic RAG with citations."""
    docs = await vector_db.search(question, limit=top_k)
    context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

    response = await llm.chat([
        {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ])

    return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
```

## Key Decisions

| Decision | Recommendation |
|----------|----------------|
| Embedding model | `text-embedding-3-small` (general), `voyage-3.5` (production) |
| Chunk size | 256-1024 tokens (512 typical) |
| Hybrid weight | 40% BM25 / 60% vector |
| Top-k |
accessibilitySkill

Accessibility patterns for WCAG 2.2 compliance, keyboard focus management, React Aria component patterns, cognitive inclusion, native HTML-first philosophy, and user preference honoring. Use when implementing screen reader support, keyboard navigation, ARIA patterns, focus traps, accessible component libraries, reduced motion, or cognitive accessibility.

agent-orchestrationSkill

Agent orchestration patterns for agentic loops, multi-agent coordination, alternative frameworks, and multi-scenario workflows. Use when building autonomous agent loops, coordinating multiple agents, evaluating CrewAI/AutoGen/Swarm, or orchestrating complex multi-step scenarios.

ai-ui-generationSkill

AI-assisted UI generation patterns for json-render, v0.app, Google Stitch, Bolt Cloud, and Cursor workflows. Covers prompt engineering for component and full-stack app generation, review checklists for AI-generated code, design token injection, refactoring for design system conformance, and CI gates for quality assurance. Use when generating UI components with AI tools, rendering multi-surface MCP visual output, reviewing AI-generated code, or integrating AI output into design systems.

analyticsSkill

Queries local analytics across OrchestKit projects for agent usage, skill frequency, hook timing, team activity, session replay, cost estimation, and model delegation trends. Privacy-safe with hashed project IDs. Supports time-range filtering and comparative analysis. Use when reviewing performance, estimating costs, or understanding usage patterns.

animation-motion-designSkill

Animation and motion design patterns using Motion library (formerly Framer Motion) and View Transitions API. Use when implementing component animations, page transitions, micro-interactions, gesture-driven UIs, or ensuring motion accessibility with prefers-reduced-motion.

api-designSkill

API design patterns for REST/GraphQL framework design, versioning strategies, and RFC 9457 error handling. Use when designing API endpoints, choosing versioning schemes, implementing Problem Details errors, or building OpenAPI specifications.

architecture-decision-recordSkill

Use this skill when documenting significant architectural decisions. Provides ADR templates following the Nygard format with sections for context, decision, consequences, and alternatives. Use when writing ADRs, recording decisions, or evaluating options.

architecture-patternsSkill

Architecture validation and patterns for clean architecture, backend structure enforcement, project structure validation, test standards, and context-aware sizing. Use when designing system boundaries, enforcing layered architecture, validating project structure, defining test standards, or choosing the right architecture tier for project scope.