Hybrid Search Architectures: Integrating BM25 with Dense Embeddings in pgvector
Retrieval-Augmented Generation (RAG) pipelines often fail when relying solely on vector search. If a user searches for a specific product SKU or a niche technical term, dense embeddings—which excel at capturing semantic intent—often miss the exact match. That’s why hybrid search is the industry standard for production-grade retrieval. By combining the lexical precision of BM25 with the conceptual depth of vector embeddings, you get the best of both worlds.
The Strategy: Why Hybrid Matters
When I build search systems, I treat vectors as the "what does this mean?" layer and BM25 as the "what is this called?" layer.
In pgvector, this integration is powerful because you keep your data in a single source of truth. You don’t need to manage a separate Elasticsearch index and a Postgres database. You compute a vector score (typically cosine similarity or inner product) and a lexical score (BM25) and then apply Reciprocal Rank Fusion (RRF) to merge them. RRF is essential here because it normalizes scores from two different mathematical distributions, preventing one method from completely drowning out the other.
Implementation: The pgvector Setup
To make this work, I use the tsvector and tsquery features in Postgres alongside pgvector. Here is a clean implementation pattern using Python and SQLAlchemy.
from sqlalchemy import text
from sqlalchemy.orm import Session
def hybrid_search(db: Session, query_text: str, query_vector: list[float], top_k: int = 10):
"""
Combines BM25 (ts_rank) and Vector Search (cosine_distance)
using Reciprocal Rank Fusion (RRF).
"""
# We use a CTE to rank both methods independently, then combine them.
# k=60 is a standard constant for RRF to balance rank importance.
sql = text("""
WITH vector_matches AS (
SELECT id, 1.0 / (60 + row_number() OVER (ORDER BY embedding <=> :vector)) as score
FROM documents
ORDER BY embedding <=> :vector
LIMIT 100
),
lexical_matches AS (
SELECT id, 1.0 / (60 + row_number() OVER (ORDER BY ts_rank(text_search_col, websearch_to_tsquery('english', :query)) DESC)) as score
FROM documents
WHERE text_search_col @@ websearch_to_tsquery('english', :query)
ORDER BY ts_rank(text_search_col, websearch_to_tsquery('english', :query)) DESC
LIMIT 100
)
SELECT id, SUM(score) as combined_score
FROM (
SELECT * FROM vector_matches
UNION ALL
SELECT * FROM lexical_matches
) combined
GROUP BY id
ORDER BY combined_score DESC
LIMIT :top_k;
""")
return db.execute(sql, {"vector": query_vector, "query": query_text, "top_k": top_k}).fetchall()
Architectural Trade-offs
1. Indexing Overhead
Adding a GIN index on your tsvector column is mandatory for performance, but it slows down INSERT and UPDATE operations. If your data changes every second, you’ll notice write contention. I usually recommend a background worker to update the tsvector column rather than doing it in the main transaction block.
2. The Normalization Problem
Raw BM25 scores and vector distances exist on different scales. You cannot simply add them. If you try (vector_score * 0.5) + (bm25_score * 0.5), you will get garbage results because the units aren't compatible. Using RRF (as shown above) is the most robust way to avoid manual weight tuning.
3. Memory usage
If your dataset exceeds a few million rows, keep an eye on your work_mem. The sorting operations for RRF can be memory-intensive. I’ve found that setting a reasonable LIMIT inside the CTEs (as I did with LIMIT 100) prevents the database from trying to rank every single row in the table, which keeps latency under 100ms.
Debugging Tips
- Check the Explain Plan: Always run
EXPLAIN ANALYZEon your hybrid query. If you see a sequential scan on the vector column instead of an HNSW index usage, your distance operator is likely mismatched with your index type. - Tokenization Mismatch: Ensure the language configuration in
websearch_to_tsquerymatches your documents. If you have multilingual data, you need to handletsvectorcolumns per language or usesimpleconfiguration, otherwise, stop words will kill your lexical search quality. - Vector Normalization: If you are using cosine similarity, ensure your vectors are normalized before insertion. If they aren't, use
cosine_distance(<=>) instead ofinner_product(<#>) to ensure the math stays consistent.
Hybrid search isn't just about throwing more tech at the problem; it’s about aligning the retrieval mechanism with how users actually query the system. Keep the logic in the database, keep your indices optimized, and RRF will handle the heavy lifting of merging your scores.
Aditya Shenvi
AI Engineer & Full-Stack Architect. Passionate about building intelligent systems, elegant UIs, and scaling web infrastructure. Open to exciting engineering opportunities in April 2026 and beyond.