Skip to main content
ClaudeWave
Skill164 repo starsupdated 3d ago

qdrant-hybrid-search-prefetches

Use when someone asks 'how to combine lexical and semantic retrieval', 'dense and sparse in one search?', 'how to combine multiple fields for retrieval?', 'payloads or sparse vectors for lexical?', 'which sparse embedding model to use?', 'BM25 vs SPLADE?'

Install in Claude Code
Copy
git clone --depth 1 https://github.com/qdrant/skills /tmp/qdrant-hybrid-search-prefetches && cp -r /tmp/qdrant-hybrid-search-prefetches/skills/qdrant-search-quality/search-strategies/hybrid-search/search-types ~/.claude/skills/qdrant-hybrid-search-prefetches
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Different Searches in One Query API Request

Each `prefetch` runs exactly one search per one query. 

Understand if user wants to run several parallel searches on:
1. The same vector representations but different queries or filters.
2. Different vector representations but the same raw query.

If first, help user to design logic of constructing query or/and filters on application side and then check [Combining Searches](../combining-searches/SKILL.md). Don't forget to create [indices on filterable payload fields](https://skills.qdrant.tech/md/documentation/manage-data/indexing/?s=payload-index), immediately after collection creation, prior to building HNSW, so filterable HNSW could be constructed.

If second, use [named vectors](https://skills.qdrant.tech/md/documentation/manage-data/vectors/?s=named-vectors), which allow to store multiple vector types per point in one collection. Beware that named vectors currently can be configured only at collection creation. To choose vectors, check following recommendations.

## Missed Keyword Matches

Use when: pure vector search misses exact term or keyword matches and you need lexical retrieval alongside semantic search.

Most likely you need a sparse vector for exact text search alongside the dense one. Qdrant uses sparse vectors for lexical searches, as [payload filtering doesn't provide any ranking score](https://skills.qdrant.tech/md/documentation/search/text-search/?s=filtering-versus-querying).

### Choose a Sparse Vector for Text
- **BM25** statistical representations, built into Qdrant core (computed server-side). Good baseline, works out-of-domain, usually for long texts. Can be used for non-English content, but needs to be configured per language (tokenization, stemming, stopwords, etc) at indexing and retrieval time. More in [Text Search Guide](https://skills.qdrant.tech/md/documentation/search/text-search/?s=bm25)
- **BM42** learned sparse, based on BM25, but better for small chunks of text & with meaning understanding. Works only on English. Requires fine-tuning for domain-specific retrieval. Requires FastEmbed (Python/REST only, not available in all SDKs). Not maintained. 
- **miniCOIL** learned sparse, BM25 with additional understanding of words meaning in context. Works only on English. Requires fine-tuning for domain-specific retrieval. Requires FastEmbed. Usage shown in [FastEmbed miniCOIL documentation](https://skills.qdrant.tech/md/documentation/fastembed/fastembed-minicoil/).
- **SPLADE++** learned sparse with term expansion. Heavier inference and resources usage but better performance due to term expansion. Requires fine-tuning for domain-specific retrieval. Provided in Qdrant Cloud Inference and FastEmbed versions work only on English. To use with FastEmbed, check [FastEmbed SPLADE documentation](https://skills.qdrant.tech/md/documentation/fastembed/fastembed-splade/).
- **External learned sparse embeddings**, for example BAAI/bge-m3.

What to remember when using sparse vectors for lexical search:
- tokenization and stemming affect exact matches, especially on custom codes, terms, etc.

What to remember when using Qdrant BM25 and miniCOIL (based on BM25):
- avg_len in formula is not computed server-side, it is a user responsibility and passed as a parameter
- BM25 might be not good for small chunks of text, as BM25 algorithm was initially created for search on long documents; consider adjusting document statistics in sparse vectors (TF & IDF, k, b).
- Qdrant BM25 vectors are configured per language, so consider customizing stop words, stemming & tokenization when users documents mix several languages or carefully configure vectors per point when they are monolingual.

More on [Sparse Vectors for Text Search](https://skills.qdrant.tech/md/course/essentials/day-3/sparse-retrieval-demo/)

## Need to Combine Multiple Representations of the Same Item

Use when: the same item is embedded in multiple ways (e.g. different models, languages, or modalities) and you want to search across different representations in one request (don't have to be all of them, can be even one).

Use multiple named vector prefetches, each prefetch covers one representation.

- If you have groups and subgroups of representations (document -> chunk, image -> patch), you could use [searching in groups](https://skills.qdrant.tech/md/documentation/search/search/?s=search-groups). To not store identical payloads several times, check [Lookup in Groups](https://skills.qdrant.tech/md/documentation/search/search/#lookup-in-groups)

You can also search directly on [multivectors](https://skills.qdrant.tech/md/documentation/manage-data/vectors/?s=multivectors), a matrix of dense vectors, in a prefetch.

However, it comes with several considerations, as multivectors were designed to support late interaction models using max similarity metric, so it's impossible to retrieve the list of individual max similarity scores for each query vector.

Moreover, multivectors are rarely a good pick for prefetch:
- max similarity metric is not symmetric, so [using HNSW index with it could be problematic](https://skills.qdrant.tech/md/course/multi-vector-search/module-1/maxsim-distance/#the-hnsw-challenge)
- [multivector representations are very heavy, as search process on them](https://skills.qdrant.tech/md/course/multi-vector-search/module-1/problems-multi-vector). 

There are ways to make multivector retrieval cheaper (MUVERA, pooling), you can see more in ["Evaluating Tradeoffs of Multi-stage Multi-vector Search"](https://skills.qdrant.tech/md/course/multi-vector-search/module-3/evaluating-pipelines/)

## What NOT to Do
- Choose any search method (for example, BM25) without evaluation of its quality & resources used.
- Use any search method (for example, BM25) without paying attention to the specifics of their configuration and applicability to the use case.
qdrant-clients-sdkSkill

Qdrant provides client SDKs for various programming languages, allowing easy integration with Qdrant deployments.

qdrant-deployment-optionsSkill

Guides Qdrant deployment selection. Use when someone asks 'how to deploy Qdrant', 'Docker vs Cloud', 'local mode', 'embedded Qdrant', 'Qdrant EDGE', 'which deployment option', 'self-hosted vs cloud', or 'need lowest latency deployment'. Also use when choosing between deployment types for a new project.

qdrant-model-migrationSkill

Guides embedding model migration in Qdrant without downtime. Use when someone asks 'how to switch embedding models', 'how to migrate vectors', 'how to update to a new model', 'zero-downtime model change', 'how to re-embed my data', or 'can I use two models at once'. Also use when upgrading model dimensions, switching providers, or A/B testing models.

qdrant-monitoringSkill

Guides Qdrant monitoring and observability setup. Use when someone asks 'how to monitor Qdrant', 'what metrics to track', 'is Qdrant healthy', 'optimizer stuck', 'why is memory growing', 'requests are slow', or needs to set up Prometheus, Grafana, or health checks. Also use when debugging production issues that require metric analysis.

qdrant-monitoring-debuggingSkill

Diagnoses Qdrant production issues using metrics and observability tools. Use when someone reports 'optimizer stuck', 'indexing too slow', 'memory too high', 'OOM crash', 'queries are slow', 'latency spike', or 'search was fast now it's slow'. Also use when performance degrades without obvious config changes.

qdrant-monitoring-setupSkill

Guides Qdrant monitoring setup including Prometheus scraping, health probes, Hybrid Cloud metrics, alerting, and log centralization. Use when someone asks 'how to set up monitoring', 'Prometheus config', 'Grafana dashboard', 'health check endpoints', 'how to scrape Hybrid Cloud', 'what alerts to set', 'how to centralize logs', or 'audit logging'.

qdrant-performance-optimizationSkill

Different techniques to optimize the performance of Qdrant, including indexing strategies, query optimization, and hardware considerations. Use when you want to improve the speed and efficiency of your Qdrant deployment.

qdrant-indexing-performance-optimizationSkill

Diagnoses and fixes slow Qdrant indexing and data ingestion. Use when someone reports 'uploads are slow', 'indexing takes forever', 'optimizer is stuck', 'HNSW build time too long', or 'data uploaded but search is bad'. Also use when optimizer status shows errors, segments won't merge, or indexing threshold questions arise.