using-vector-databases
This Claude Code skill provides implementation guidance for vector databases across multiple platforms including Qdrant, Pinecone, Milvus, and pgvector, alongside embedding generation strategies and chunking techniques. Use this skill when building retrieval-augmented generation systems, semantic search functionality, recommendation engines, or similarity-based document retrieval for chatbots and knowledge base applications.
git clone --depth 1 https://github.com/ancoleman/ai-design-components /tmp/using-vector-databases && cp -r /tmp/using-vector-databases/skills/using-vector-databases ~/.claude/skills/using-vector-databasesSKILL.md
# Vector Databases for AI Applications
## When to Use This Skill
Use this skill when implementing:
- **RAG (Retrieval-Augmented Generation)** systems for AI chatbots
- **Semantic search** capabilities (meaning-based, not just keyword)
- **Recommendation systems** based on similarity
- **Multi-modal AI** (unified search across text, images, audio)
- **Document similarity** and deduplication
- **Question answering** over private knowledge bases
## Quick Decision Framework
### 1. Vector Database Selection
```
START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
├─ Using PostgreSQL already?
│ └─ pgvector (<10M vectors, tight budget)
│ See: references/pgvector.md
│
└─ No existing vector database?
│
├─ OPERATIONAL PREFERENCE?
│ │
│ ├─ Zero-ops managed only
│ │ └─ Pinecone (fully managed, excellent DX)
│ │ See: references/pinecone.md
│ │
│ └─ Flexible (self-hosted or managed)
│ │
│ ├─ SCALE: <100M vectors + complex filtering ⭐
│ │ └─ Qdrant (RECOMMENDED)
│ │ • Best metadata filtering
│ │ • Built-in hybrid search (BM25 + Vector)
│ │ • Self-host: Docker/K8s
│ │ • Managed: Qdrant Cloud
│ │ See: references/qdrant.md
│ │
│ ├─ SCALE: >100M vectors + GPU acceleration
│ │ └─ Milvus / Zilliz Cloud
│ │ See: references/milvus.md
│ │
│ ├─ Embedded / No server
│ │ └─ LanceDB (serverless, edge deployment)
│ │
│ └─ Local prototyping
│ └─ Chroma (simple API, in-memory)
```
### 2. Embedding Model Selection
```
REQUIREMENTS?
├─ Best quality (cost no object)
│ └─ Voyage AI voyage-3 (1024d)
│ • 9.74% better than OpenAI on MTEB
│ • ~$0.12/1M tokens
│ See: references/embedding-strategies.md
│
├─ Enterprise reliability
│ └─ OpenAI text-embedding-3-large (3072d)
│ • Industry standard
│ • ~$0.13/1M tokens
│ • Maturity shortening: reduce to 256/512/1024d
│
├─ Cost-optimized
│ └─ OpenAI text-embedding-3-small (1536d)
│ • ~$0.02/1M tokens (6x cheaper)
│ • 90-95% of large model performance
│
├─ Multilingual (100+ languages)
│ └─ Cohere embed-v3 (1024d)
│ • ~$0.10/1M tokens
│
└─ Self-hosted / Privacy-critical
├─ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
├─ Multilingual: BAAI/bge-m3 (1024d, MIT)
└─ Long docs: jina-embeddings-v2 (768d, 8K context)
```
## Core Concepts
### Document Chunking Strategy
**Recommended defaults for most RAG systems:**
- **Chunk size:** 512 tokens (not characters)
- **Overlap:** 50 tokens (10% overlap)
**Why these numbers?**
- 512 tokens balances context vs. precision
- Too small (128-256): Fragments concepts, loses context
- Too large (1024-2048): Dilutes relevance, wastes LLM tokens
- 50 token overlap ensures sentences aren't split mid-context
See `references/chunking-patterns.md` for advanced strategies by content type.
### Hybrid Search (Vector + Keyword)
**Hybrid Search = Vector Similarity + BM25 Keyword Matching**
```
User Query: "OAuth refresh token implementation"
│
┌──────┴──────┐
│ │
Vector Search Keyword Search
(Semantic) (BM25)
│ │
Top 20 docs Top 20 docs
│ │
└──────┬──────┘
│
Reciprocal Rank Fusion
(Merge + Re-rank)
│
Final Top 5 Results
```
**Why hybrid matters:**
- Vector captures semantic meaning ("OAuth refresh" ≈ "token renewal")
- Keyword ensures exact matches ("refresh_token" literal)
- Combined provides best retrieval quality
See `references/hybrid-search.md` for implementation details.
## Getting Started
### Python + Qdrant Example
```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# 1. Initialize client
client = QdrantClient("localhost", port=6333)
# 2. Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
# 3. Insert documents with embeddings
points = [
PointStruct(
id=idx,
vector=embedding, # From OpenAI/Voyage/etc
payload={
"text": chunk_text,
"source": "docs/api.md",
"section": "Authentication"
}
)
for idx, (embedding, chunk_text) in enumerate(chunks)
]
client.upsert(collection_name="documents", points=points)
# 4. Search with metadata filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
limit=5,
query_filter={
"must": [
{"key": "section", "match": {"value": "Authentication"}}
]
}
)
```
For complete examples, see `examples/qdrant-python/`.
### TypeScript + Qdrant Example
```typescript
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});
```
For complete examples, see `examples/typescript-rag/`.
## RAG Pipeline Architecture
### Complete Pipeline Components
```
1. INGESTION
├─ Document Loading (PDF, web, code, Office)
├─ Text Extraction & Cleaning
├─ Chunking (semantic, recursive, code-aware)
└─ Embedding Generation (batch, rate-limited)
2. INDEXING
├─ Vector Store Insertion (batch upsert)
├─ Index Configuration (HNSW, distance metric)
└─ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
├─ Query ProcessingManage Linux systems covering systemd services, process management, filesystems, networking, performance tuning, and troubleshooting. Use when deploying applications, optimizing server performance, diagnosing production issues, or managing users and security on Linux servers.
Data pipelines, feature stores, and embedding generation for AI/ML systems. Use when building RAG pipelines, ML feature serving, or data transformations. Covers feature stores (Feast, Tecton), embedding pipelines, chunking strategies, orchestration (Dagster, Prefect, Airflow), dbt transformations, data versioning (LakeFS), and experiment tracking (MLflow, W&B).
Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault, wide tables), data mesh principles, and medallion architecture patterns. Use when architecting data platforms, choosing between centralized vs decentralized patterns, selecting table formats (Iceberg, Delta Lake), or designing data governance frameworks.
Design cloud network architectures with VPC patterns, subnet strategies, zero trust principles, and hybrid connectivity. Use when planning VPC topology, implementing multi-cloud networking, or establishing secure network segmentation for cloud workloads.
Design comprehensive security architectures using defense-in-depth, zero trust principles, threat modeling (STRIDE, PASTA), and control frameworks (NIST CSF, CIS Controls, ISO 27001). Use when designing security for new systems, auditing existing architectures, or establishing security governance programs.
Assembles component outputs from AI Design Components skills into unified, production-ready component systems with validated token integration, proper import chains, and framework-specific scaffolding. Use as the capstone skill after running theming, layout, dashboard, data-viz, or feedback skills to wire components into working React/Next.js, Python, or Rust projects.
Builds AI chat interfaces and conversational UI with streaming responses, context management, and multi-modal support. Use when creating ChatGPT-style interfaces, AI assistants, code copilots, or conversational agents. Handles streaming text, token limits, regeneration, feedback loops, tool usage visualization, and AI-specific error patterns. Provides battle-tested components from leading AI products with accessibility and performance built in.
Constructs secure, efficient CI/CD pipelines with supply chain security (SLSA), monorepo optimization, caching strategies, and parallelization patterns for GitHub Actions, GitLab CI, and Argo Workflows. Use when setting up automated testing, building, or deployment workflows.