Vector Databases in 2026: Benchmark Comparisons of Qdrant, Milvus, and pgvector
Building RAG pipelines in 2026 feels a lot different than it did two years ago. Back then, we were just throwing embeddings into whatever index was easiest to spin up. Now, with production workloads scaling into the hundreds of millions of vectors and multi-modal requirements becoming the standard, the choice of vector store is a critical architectural decision.
I recently finished a performance audit comparing Qdrant, Milvus, and pgvector. My goal was to see how they handle high-concurrency retrieval while maintaining sub-50ms latency. Here is what I found.
The Contenders: A Quick Reality Check
- Qdrant: Still my go-to for Rust-based performance. Its HNSW implementation is highly optimized, and the filtering capabilities are the most intuitive I've worked with.
- Milvus: This is the heavy lifter. If you are running a distributed cluster across multiple nodes with massive throughput needs, Milvus handles the complexity better than anything else.
- pgvector: The "I already have Postgres" choice. It has matured significantly. With the introduction of the
hnswindex support and better integration withivfflatfor specific memory constraints, it’s no longer just a toy for small projects.
Performance Benchmarks: The Numbers
In my test setup, I used a dataset of 5 million 1536-dimensional vectors. I ran a mix of KNN and ANN searches under a load of 500 concurrent users.
| Feature | Qdrant | Milvus | pgvector |
|---|---|---|---|
| P99 Latency | 22ms | 35ms | 48ms |
| Throughput (RPS) | 1200 | 2800 | 650 |
| Ease of Setup | High | Low | Very High |
| Memory Usage | Moderate | High | Moderate |
Milvus wins on pure throughput, but the operational overhead of managing etcd and MinIO alongside the proxy nodes is a headache I usually try to avoid unless I absolutely need the scale. Qdrant hits the sweet spot for most of my mid-sized SaaS projects.
Implementation: Integrating pgvector for RAG
If you are already deep in the Postgres ecosystem, don't jump ship to a dedicated vector DB just yet. With pgvector, you can handle vector search right next to your relational data, which saves you from building complex synchronization logic between two databases.
Here is how I recently set up a search-optimized table in a production migration:
import psycopg2
from pgvector.psycopg2 import register_vector
# Connect to your Postgres instance
conn = psycopg2.connect("dbname=app_db user=dev_user")
register_vector(conn)
with conn.cursor() as cur:
# Enable the extension
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
# Create table with HNSW index for high-performance recall
# m=16 (max connections per layer), ef_construction=64 (search accuracy)
cur.execute("""
CREATE TABLE IF NOT EXISTS document_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536)
);
CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m=16, ef_construction=64);
""")
conn.commit()
# Debug tip: If your queries are slow, check your memory settings.
# Ensure 'shared_buffers' is high enough to cache the index pages.
Architectural Trade-offs
When choosing between these, I look at the "Maintenance Tax."
- The Postgres Trap: Using pgvector is great until your vector table grows to 50M+ rows. At that point, vacuuming and index maintenance start to block your primary application transactions. If your app is write-heavy, move the vectors to a separate dedicated instance.
- Qdrant’s Filtering: I prefer Qdrant when my metadata filtering is complex. Their "payload" filtering happens at the same time as the vector scan. In other systems, you often have to do a post-filter scan, which kills performance when the filter is restrictive.
- Milvus Complexity: Use Milvus only if your team is large enough to have a dedicated SRE. The distributed architecture is powerful, but debugging a split-brain scenario in a vector cluster at 3 AM is not something I wish on anyone.
Final Thoughts
If you're building a prototype, start with pgvector. It’s simple, robust, and lets you iterate on your embedding model without managing extra infrastructure. Once you start hitting performance bottlenecks or see your Postgres CPU usage spiking during search queries, migrate your vector data to Qdrant. It offers the best balance of developer experience and raw speed for most 2026-era AI applications.
Don't over-engineer early. Get the search working, measure your latency, and only scale the architecture when the metrics tell you it’s time.
Aditya Shenvi
AI Engineer & Full-Stack Architect. Passionate about building intelligent systems, elegant UIs, and scaling web infrastructure. Open to exciting engineering opportunities in April 2026 and beyond.