faiss
FAISS is Facebook AI's high-performance library for similarity search across millions or billions of dense vectors using GPU acceleration and various index types like Flat, IVF, and HNSW. Use it when you need fast k-nearest neighbor retrieval on large-scale vector datasets without metadata filtering requirements, particularly for offline batch processing or latency-critical applications where pure vector similarity is the primary concern.
git clone --depth 1 https://github.com/NousResearch/hermes-agent /tmp/faiss && cp -r /tmp/faiss/optional-skills/mlops/faiss ~/.claude/skills/faissSKILL.md
# FAISS - Efficient Similarity Search
Facebook AI's library for billion-scale vector similarity search.
## When to use FAISS
**Use FAISS when:**
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings
**Metrics**:
- **31,700+ GitHub stars**
- Meta/Facebook AI Research
- **Handles billions of vectors**
- **C++** with Python bindings
**Use alternatives instead**:
- **Chroma/Pinecone**: Need metadata filtering
- **Weaviate**: Need full database features
- **Annoy**: Simpler, fewer features
## Quick start
### Installation
```bash
# CPU only
pip install faiss-cpu
# GPU support
pip install faiss-gpu
```
### Basic usage
```python
import faiss
import numpy as np
# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')
# Create index
index = faiss.IndexFlatL2(d) # L2 distance
index.add(vectors) # Add vectors
# Search
k = 5 # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")
```
## Index types
### 1. Flat (exact search)
```python
# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)
# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)
# Slowest, most accurate
```
### 2. IVF (inverted file) - Fast approximate
```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)
# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)
# Train on data
index.train(vectors)
# Add vectors
index.add(vectors)
# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)
```
### 3. HNSW (Hierarchical NSW) - Best quality/speed
```python
# HNSW index
M = 32 # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)
# No training needed
index.add(vectors)
# Search
distances, indices = index.search(query, k)
```
### 4. Product Quantization - Memory efficient
```python
# PQ reduces memory by 16-32×
m = 8 # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)
# Train and add
index.train(vectors)
index.add(vectors)
```
## Save and load
```python
# Save index
faiss.write_index(index, "large.index")
# Load index
index = faiss.read_index("large.index")
# Continue using
distances, indices = index.search(query, k)
```
## GPU acceleration
```python
# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
# 10-100× faster than CPU
```
## LangChain integration
```python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
# Save
vectorstore.save_local("faiss_index")
# Load
vectorstore = FAISS.load_local(
"faiss_index",
OpenAIEmbeddings(),
allow_dangerous_deserialization=True
)
# Search
results = vectorstore.similarity_search("query", k=5)
```
## LlamaIndex integration
```python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
```
## Best practices
1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
3. **Use GPU for large datasets** - 10-100× faster
4. **Save trained indices** - Training is expensive
5. **Tune nprobe/ef_search** - Balance speed/accuracy
6. **Monitor memory** - PQ for large datasets
7. **Batch queries** - Better GPU utilization
## Performance
| Index Type | Build Time | Search Time | Memory | Accuracy |
|------------|------------|-------------|--------|----------|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |
## Resources
- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **License**: MITOperate the Antigravity CLI (agy): plugins, auth, sandbox.
Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. Requires the blackbox CLI and a Blackbox AI API key.
Delegate coding to xAI Grok Build CLI (features, PRs).
Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
Delegate coding to OpenHands CLI (model-agnostic, LiteLLM).
Read-only EVM client: wallets, tokens, gas across 8 chains.
Hyperliquid market data, account history, trade review.
Query Solana blockchain data with USD pricing — wallet balances, token portfolios with values, transaction details, NFTs, whale detection, and live network stats. Uses Solana RPC + CoinGecko. No API key required.