rag
This skill implements core components for Retrieval-Augmented Generation systems, including document chunking, embedding generation, vector storage integration, and query retrieval pipelines. Use it when building Q&A systems over proprietary documents, creating knowledge-base-backed chatbots, implementing semantic search functionality, reducing AI hallucinations with grounded responses, or developing documentation assistants that need to access domain-specific information from external knowledge bases.
git clone --depth 1 https://github.com/giuseppe-trisciuoglio/developer-kit /tmp/rag && cp -r /tmp/rag/plugins/developer-kit-ai/skills/rag ~/.claude/skills/ragSKILL.md
# RAG Implementation
Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.
## Overview
This skill covers: document processing, embedding generation, vector storage, retrieval configuration, and RAG pipeline implementation.
## When to Use
- Building Q&A systems over proprietary documents
- Creating chatbots with factual information from knowledge bases
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded, sourced responses
- Building documentation assistants and research tools
- Enabling AI systems to access domain-specific knowledge
## Instructions
### Step 1: Choose Vector Database
Select based on your requirements:
| Requirement | Recommended |
|-------------|-------------|
| Production scalability | Pinecone, Milvus |
| Open-source | Weaviate, Qdrant |
| Local development | Chroma, FAISS |
| Hybrid search | Weaviate with BM25 |
### Step 2: Select Embedding Model
| Use Case | Model |
|----------|-------|
| General purpose | text-embedding-ada-002 |
| Fast and lightweight | all-MiniLM-L6-v2 |
| Multilingual | e5-large-v2 |
| Best performance | bge-large-en-v1.5 |
### Step 3: Implement Document Processing Pipeline
1. Load documents from source (file system, database, API)
2. Clean and preprocess (remove formatting, normalize text)
3. Split documents into chunks with appropriate strategy
4. Generate embeddings for each chunk
5. Store embeddings in vector database with metadata
**Validation**: Verify embeddings were generated successfully:
```java
List<Embedding> embeddings = embeddingModel.embedAll(segments);
if (embeddings.isEmpty() || embeddings.get(0).dimension() != expectedDim) {
throw new IllegalStateException("Embedding generation failed");
}
```
### Step 4: Configure Retrieval Strategy
Choose the appropriate strategy:
- **Dense Retrieval**: Semantic similarity via embeddings (default for most cases)
- **Hybrid Search**: Dense + sparse retrieval for better coverage
- **Metadata Filtering**: Filter by document attributes
- **Reranking**: Cross-encoder reranking for high-precision requirements
### Step 5: Build RAG Pipeline
1. Create content retriever with your embedding store
2. Configure AI service with retriever and chat memory
3. Implement prompt template with context injection
4. Add response validation and grounding checks
**Validation**: Test with known queries to verify context injection works correctly.
**Error Handling**: For batch ingestion, wrap in retry logic:
```java
for (Document doc : documents) {
int attempts = 0;
while (attempts < 3) {
try {
store.add(embeddingModel.embed(doc).content(), doc.toTextSegment());
break;
} catch (EmbeddingException e) {
attempts++;
if (attempts == 3) throw new RuntimeException("Failed after 3 retries", e);
}
}
}
```
### Step 6: Evaluate and Optimize
1. Measure retrieval metrics: precision@k, recall@k, MRR
2. Evaluate answer quality: faithfulness, relevance
3. Monitor performance and user feedback
4. Iterate on chunking, retrieval, and prompt parameters
## Examples
### Example 1: Basic Document Q&A
```java
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");
InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);
DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
.chatModel(chatModel)
.contentRetriever(EmbeddingStoreContentRetriever.from(store))
.build();
String answer = assistant.answer("What is the company policy on remote work?");
```
### Example 2: Metadata-Filtered Retrieval
```java
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(store)
.embeddingModel(embeddingModel)
.maxResults(5)
.minScore(0.7)
.filter(metadataKey("category").isEqualTo("technical"))
.build();
```
### Example 3: Multi-Source RAG Pipeline
```java
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);
List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);
```
### Example 4: RAG with Chat Memory
```java
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(retriever)
.build();
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?"); // Maintains context
```
## Best Practices
### Document Preparation
- Clean documents before ingestion; remove irrelevant content and formatting
- Add relevant metadata for filtering and context
### Chunking Strategy
- Use 500-1000 tokens per chunk for optimal balance
- Include 10-20% overlap to preserve context at boundaries
- Test different sizes for your specific use case
### Retrieval Optimization
- Start with high k values (10-20), then filter/rerank
- Use metadata filtering to improve relevance
- Monitor retrieval quality and iterate based on user feedback
### Performance
- Cache embeddings for frequently accessed content
- Use batch processing for document ingestion
- Optimize vector store indexing for your scale
## Constraints and Warnings
### System Constraints
- Embedding models have maximum token limits per document
- Vector databases require proper indexing for performance
- Chunk boundaries may lose context for complex documents
- Hybrid search requires additional infrastructure
### Quality Warnings
- Retrieval quality depends heavily on chunking strategy
- Embedding models may not capture domain-specific semantics
- Metadata filtering requires proper document annotation
- ReProvides chunking strategies for RAG systems. Generates chunk size recommendations (256-1024 tokens), overlap percentages (10-20%), and semantic boundary detection methods. Validates semantic coherence and evaluates retrieval precision/recall metrics. Use when building retrieval-augmented generation systems, vector databases, or processing large documents.
>
Provides AWS CloudFormation patterns for Auto Scaling including EC2, ECS, and Lambda. Use when creating Auto Scaling groups, launch configurations, launch templates, scaling policies, lifecycle hooks, and predictive scaling. Covers template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and best practices for high availability and cost optimization.
Provides AWS CloudFormation patterns for Amazon Bedrock resources including agents, knowledge bases, data sources, guardrails, prompts, flows, and inference profiles. Use when creating Bedrock agents with action groups, implementing RAG with knowledge bases, configuring vector stores, setting up content moderation guardrails, managing prompts, orchestrating workflows with flows, and configuring inference profiles for model optimization.
Provides AWS CloudFormation patterns for CloudFront distributions, origins (ALB, S3, Lambda@Edge, VPC Origins), CacheBehaviors, Functions, SecurityHeaders, parameters, Outputs and cross-stack references. Use when creating CloudFront distributions with CloudFormation, configuring multiple origins, implementing caching strategies, managing custom domains with ACM, configuring WAF, and optimizing performance.
Provides AWS CloudFormation patterns for CloudWatch monitoring, metrics, alarms, dashboards, logs, and observability. Use when creating CloudWatch metrics, alarms, dashboards, log groups, log subscriptions, anomaly detection, synthesized canaries, Application Signals, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and CloudWatch best practices for monitoring production infrastructure.
Provides AWS CloudFormation patterns for DynamoDB tables, GSIs, LSIs, auto-scaling, and streams. Use when creating DynamoDB tables with CloudFormation, configuring primary keys, local/global secondary indexes, capacity modes (on-demand/provisioned), point-in-time recovery, encryption, TTL, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references.
Provides AWS CloudFormation patterns for EC2 instances, Security Groups, IAM roles, and load balancers. Use when creating EC2 instances, SPOT instances, Security Groups, IAM roles for EC2, Application Load Balancers (ALB), Target Groups, and implementing template structure with Parameters, Outputs, Mappings, Conditions, and cross-stack references.