Skip to main content
ClaudeWave
Skill279 repo starsupdated 6d ago

chunking-strategy

The chunking-strategy skill provides methods for dividing documents into appropriately sized pieces for retrieval-augmented generation systems. It recommends specific chunk sizes between 256 and 1024 tokens, overlap percentages, and boundary detection approaches while validating semantic coherence and measuring retrieval performance. Use it when building or optimizing RAG pipelines, vector databases, or document processing workflows where retrieval quality needs improvement.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/giuseppe-trisciuoglio/developer-kit /tmp/chunking-strategy && cp -r /tmp/chunking-strategy/plugins/developer-kit-ai/skills/chunking-strategy ~/.claude/skills/chunking-strategy
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Chunking Strategy for RAG Systems

## Overview

Provides chunking strategies for RAG systems, vector databases, and document processing. Recommends chunk sizes, overlap percentages, and boundary detection methods; validates semantic coherence; evaluates retrieval metrics.

## When to Use

Use when building or optimizing RAG systems, vector search pipelines, document chunking workflows, or performance-tuning existing systems with poor retrieval quality.

## Instructions

### Choose Chunking Strategy

Select based on document type and use case:

1. **Fixed-Size Chunking** (Level 1)
   - Use for simple documents without clear structure
   - Start with 512 tokens and 10-20% overlap
   - Adjust: 256 for factoid queries, 1024 for analytical

2. **Recursive Character Chunking** (Level 2)
   - Use for documents with structural boundaries
   - Hierarchical separators: paragraphs → sentences → words
   - Customize for document types (HTML, Markdown, JSON)

3. **Structure-Aware Chunking** (Level 3)
   - Use for structured content (Markdown, code, tables, PDFs)
   - Preserve semantic units: functions, sections, table blocks
   - Validate structure preservation post-split

4. **Semantic Chunking** (Level 4)
   - Use for complex documents with thematic shifts
   - Embedding-based boundary detection with 0.8 similarity threshold
   - Buffer size: 3-5 sentences

5. **Advanced Methods** (Level 5)
   - Late Chunking for long-context models
   - Contextual Retrieval for high-precision requirements
   - Monitor computational cost vs. retrieval gain

Reference: [references/strategies.md](references/strategies.md).

### Implement Chunking Pipeline

1. **Pre-process documents**
   - Analyze structure, content types, information density
   - Identify multi-modal content (tables, images, code)

2. **Select parameters**
   - Chunk size: embedding model context window / 4
   - Overlap: 10-20% for most cases
   - Strategy-specific settings

3. **Process and validate**
   - Apply chunking strategy
   - Validate coherence: run `evaluate_chunks.py --coherence` (see below)
   - Test with representative documents

4. **Evaluate and iterate**
   - Measure precision and recall
   - If precision < 0.7: reduce chunk_size by 25% and re-evaluate
   - If recall < 0.6: increase overlap by 10% and re-evaluate
   - Monitor latency and memory usage

Reference: [references/implementation.md](references/implementation.md).

### Validate Chunk Quality

Run validation commands to assess chunk quality:

```bash
# Check semantic coherence (requires sentence-transformers)
python -c "
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
chunks = [...]  # your chunks
embeddings = model.encode(chunks)
similarity = (embeddings @ embeddings.T).mean()
print(f'Cohesion: {similarity:.3f}')  # target: 0.3-0.7
"

# Measure retrieval precision
python -c "
relevant = sum(1 for c in retrieved if c in relevant_chunks)
precision = relevant / len(retrieved)
print(f'Precision: {precision:.2f}')  # target: >= 0.7
"

# Check chunk size distribution
python -c "
import numpy as np
sizes = [len(c.split()) for c in chunks]
print(f'Mean: {np.mean(sizes):.0f}, Std: {np.std(sizes):.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')
"
```

Reference: [references/evaluation.md](references/evaluation.md).

## Examples

### Fixed-Size Chunking

```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=25,
    length_function=len
)
chunks = splitter.split_documents(documents)
```

### Structure-Aware Code Chunking

```python
import ast

def chunk_python_code(code):
    tree = ast.parse(code)
    chunks = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
            chunks.append(ast.get_source_segment(code, node))
    return chunks
```

### Semantic Chunking

```python
def semantic_chunk(text, similarity_threshold=0.8):
    sentences = split_into_sentences(text)
    embeddings = generate_embeddings(sentences)
    chunks, current = [], [sentences[0]]
    for i in range(1, len(sentences)):
        sim = cosine_similarity(embeddings[i-1], embeddings[i])
        if sim < similarity_threshold:
            chunks.append(" ".join(current))
            current = [sentences[i]]
        else:
            current.append(sentences[i])
    chunks.append(" ".join(current))
    return chunks
```

## Best Practices

### Core Principles
- Balance context preservation with retrieval precision
- Maintain semantic coherence within chunks
- Optimize for embedding model context window constraints

### Implementation
- Start with fixed-size (512 tokens, 15% overlap)
- Iterate based on document characteristics
- Test with domain-specific documents before deployment

### Pitfalls to Avoid
- Over-chunking: context-poor small chunks
- Under-chunking: missing information in oversized chunks
- Ignoring semantic boundaries and document structure
- One-size-fits-all for diverse content types

## Constraints and Warnings

### Resource Considerations
- Semantic methods require significant compute resources
- Late chunking needs long-context embedding models
- Complex strategies increase processing latency
- Monitor memory for large document batches

### Quality Requirements
- Validate semantic coherence post-processing
- Test with representative documents before deployment
- Ensure chunks maintain standalone meaning
- Implement error handling for malformed content

## References

- [strategies.md](references/strategies.md) - Detailed strategies
- [implementation.md](references/implementation.md) - Implementation guidelines
- [evaluation.md](references/evaluation.md) - Performance metrics
- [tools.md](references/tools.md) - Libraries and frameworks
- [research.md](references/research.md) - Research papers
- [advanced-strategies.md](references/advanced-strategies.md) - 11 advanced methods
- [semantic-met
prompt-engineeringSkill

>

ragSkill

Implements document chunking, embedding generation, vector storage, and retrieval pipelines for Retrieval-Augmented Generation systems. Use when building RAG applications, creating document Q&A systems, or integrating AI with knowledge bases.

aws-cloudformation-auto-scalingSkill

Provides AWS CloudFormation patterns for Auto Scaling including EC2, ECS, and Lambda. Use when creating Auto Scaling groups, launch configurations, launch templates, scaling policies, lifecycle hooks, and predictive scaling. Covers template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and best practices for high availability and cost optimization.

aws-cloudformation-bedrockSkill

Provides AWS CloudFormation patterns for Amazon Bedrock resources including agents, knowledge bases, data sources, guardrails, prompts, flows, and inference profiles. Use when creating Bedrock agents with action groups, implementing RAG with knowledge bases, configuring vector stores, setting up content moderation guardrails, managing prompts, orchestrating workflows with flows, and configuring inference profiles for model optimization.

aws-cloudformation-cloudfrontSkill

Provides AWS CloudFormation patterns for CloudFront distributions, origins (ALB, S3, Lambda@Edge, VPC Origins), CacheBehaviors, Functions, SecurityHeaders, parameters, Outputs and cross-stack references. Use when creating CloudFront distributions with CloudFormation, configuring multiple origins, implementing caching strategies, managing custom domains with ACM, configuring WAF, and optimizing performance.

aws-cloudformation-cloudwatchSkill

Provides AWS CloudFormation patterns for CloudWatch monitoring, metrics, alarms, dashboards, logs, and observability. Use when creating CloudWatch metrics, alarms, dashboards, log groups, log subscriptions, anomaly detection, synthesized canaries, Application Signals, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and CloudWatch best practices for monitoring production infrastructure.

aws-cloudformation-dynamodbSkill

Provides AWS CloudFormation patterns for DynamoDB tables, GSIs, LSIs, auto-scaling, and streams. Use when creating DynamoDB tables with CloudFormation, configuring primary keys, local/global secondary indexes, capacity modes (on-demand/provisioned), point-in-time recovery, encryption, TTL, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references.

aws-cloudformation-ec2Skill

Provides AWS CloudFormation patterns for EC2 instances, Security Groups, IAM roles, and load balancers. Use when creating EC2 instances, SPOT instances, Security Groups, IAM roles for EC2, Application Load Balancers (ALB), Target Groups, and implementing template structure with Parameters, Outputs, Mappings, Conditions, and cross-stack references.