qdrant-monitoring-debugging
Diagnoses Qdrant production issues using metrics and observability tools. Use when someone reports 'optimizer stuck', 'indexing too slow', 'memory too high', 'OOM crash', 'queries are slow', 'latency spike', or 'search was fast now it's slow'. Also use when performance degrades without obvious config changes.
git clone --depth 1 https://github.com/qdrant/skills /tmp/qdrant-monitoring-debugging && cp -r /tmp/qdrant-monitoring-debugging/skills/qdrant-monitoring/debugging ~/.claude/skills/qdrant-monitoring-debuggingSKILL.md
# How to Debug Qdrant with Metrics
First check optimizer status. Most production issues trace back to active optimizations competing for resources. If optimizer is clean, check memory, then request metrics.
## Optimizer Stuck or Too Slow
Use when: optimizer running for hours, not finishing, or showing errors.
- Use `/collections/{collection_name}/optimizations` endpoint (v1.17+) to check status [Optimization monitoring](https://skills.qdrant.tech/md/documentation/ops-optimization/optimizer/?s=optimization-monitoring)
- Query with optional detail flags: `?with=queued,completed,idle_segments`
- Returns: queued optimizations count, active optimizer type, involved segments, progress tracking
- Web UI has an Optimizations tab with timeline view and per-task duration metrics [Web UI](https://skills.qdrant.tech/md/documentation/ops-optimization/optimizer/?s=web-ui)
- If `optimizer_status` shows an error in collection info, check logs for disk full or corrupted segments
- Large merges and HNSW rebuilds legitimately take hours on big datasets. Check progress before assuming it's stuck.
## Memory Seems Too High
Use when: memory exceeds expectations, node crashes with OOM, or memory keeps growing.
- Process memory metrics available via `/metrics` (RSS, allocated bytes, page faults)
- Qdrant uses two types of RAM: resident memory (data structures, quantized vectors) and OS page cache (cached disk reads). Page cache filling available RAM is normal. [Memory article](https://qdrant.tech/articles/memory-consumption/)
- If resident memory (RSSAnon) exceeds 80% of total RAM, investigate
- Check `/telemetry` for per-collection breakdown of point counts and vector configurations
- Estimate expected memory: `num_vectors * dimensions * 4 bytes * 1.5` for vectors, plus payload and index overhead [Capacity planning](https://skills.qdrant.tech/md/documentation/capacity-planning/)
- Common causes of unexpected growth: quantized vectors with `always_ram=true`, too many payload indexes, large `max_segment_size` during optimization
## Queries Are Slow
Use when: queries slower than expected and you need to identify the cause.
- Track `rest_responses_avg_duration_seconds` and `rest_responses_max_duration_seconds` per endpoint
- Use histogram metric `rest_responses_duration_seconds` (v1.8+) for percentile analysis in Grafana
- Equivalent gRPC metrics with `grpc_responses_` prefix
- Check optimizer status first. Active optimizations compete for CPU and I/O, degrading search latency.
- Check segment count via collection info. Too many unmerged segments after bulk upload causes slower search.
- Compare filtered vs unfiltered query times. Large gap means missing payload index. [Payload index](https://skills.qdrant.tech/md/documentation/manage-data/indexing/?s=payload-index)
## What NOT to Do
- Ignore optimizer status when debugging slow queries (most common root cause)
- Assume memory leak when page cache fills RAM (normal OS behavior)
- Make config changes while optimizer is running (causes cascading re-optimizations)
- Blame Qdrant before checking if bulk upload just finished (unmerged segments)Qdrant provides client SDKs for various programming languages, allowing easy integration with Qdrant deployments.
Guides Qdrant deployment selection. Use when someone asks 'how to deploy Qdrant', 'Docker vs Cloud', 'local mode', 'embedded Qdrant', 'Qdrant EDGE', 'which deployment option', 'self-hosted vs cloud', or 'need lowest latency deployment'. Also use when choosing between deployment types for a new project.
Guides embedding model migration in Qdrant without downtime. Use when someone asks 'how to switch embedding models', 'how to migrate vectors', 'how to update to a new model', 'zero-downtime model change', 'how to re-embed my data', or 'can I use two models at once'. Also use when upgrading model dimensions, switching providers, or A/B testing models.
Guides Qdrant monitoring and observability setup. Use when someone asks 'how to monitor Qdrant', 'what metrics to track', 'is Qdrant healthy', 'optimizer stuck', 'why is memory growing', 'requests are slow', or needs to set up Prometheus, Grafana, or health checks. Also use when debugging production issues that require metric analysis.
Guides Qdrant monitoring setup including Prometheus scraping, health probes, Hybrid Cloud metrics, alerting, and log centralization. Use when someone asks 'how to set up monitoring', 'Prometheus config', 'Grafana dashboard', 'health check endpoints', 'how to scrape Hybrid Cloud', 'what alerts to set', 'how to centralize logs', or 'audit logging'.
Different techniques to optimize the performance of Qdrant, including indexing strategies, query optimization, and hardware considerations. Use when you want to improve the speed and efficiency of your Qdrant deployment.
Diagnoses and fixes slow Qdrant indexing and data ingestion. Use when someone reports 'uploads are slow', 'indexing takes forever', 'optimizer is stuck', 'HNSW build time too long', or 'data uploaded but search is bad'. Also use when optimizer status shows errors, segments won't merge, or indexing threshold questions arise.
Diagnoses and reduces Qdrant memory usage. Use when someone reports 'memory too high', 'RAM keeps growing', 'node crashed', 'out of memory', 'memory leak', or asks 'why is memory usage so high?', 'how to reduce RAM?'. Also use when memory doesn't match calculations, quantization didn't help, or nodes crash during recovery.