Hybrid Search Architectures: Integrating BM25 with Dense Embeddings in pgvector

Retrieval-Augmented Generation (RAG) pipelines often fail when relying solely on vector search. If a user searches for a specific product SKU or a niche technical term, dense embeddings—which excel at capturing semantic intent—often miss the exact match. That’s why hybrid search is the industry standard for production-grade retrieval. By combining the lexical precision of BM25 with the conceptual depth of vector embeddings, you get the best of both worlds.

The Strategy: Why Hybrid Matters

When I build search systems, I treat vectors as the "what does this mean?" layer and BM25 as the "what is this called?" layer.

In pgvector, this integration is powerful because you keep your data in a single source of truth. You don’t need to manage a separate Elasticsearch index and a Postgres database. You compute a vector score (typically cosine similarity or inner product) and a lexical score (BM25) and then apply Reciprocal Rank Fusion (RRF) to merge them. RRF is essential here because it normalizes scores from two different mathematical distributions, preventing one method from completely drowning out the other.

Implementation: The pgvector Setup

To make this work, I use the tsvector and tsquery features in Postgres alongside pgvector. Here is a clean implementation pattern using Python and SQLAlchemy.

from sqlalchemy import text
from sqlalchemy.orm import Session

def hybrid_search(db: Session, query_text: str, query_vector: list[float], top_k: int = 10):
    """
    Combines BM25 (ts_rank) and Vector Search (cosine_distance) 
    using Reciprocal Rank Fusion (RRF).
    """
    
    # We use a CTE to rank both methods independently, then combine them.
    # k=60 is a standard constant for RRF to balance rank importance.
    sql = text("""
    WITH vector_matches AS (
        SELECT id, 1.0 / (60 + row_number() OVER (ORDER BY embedding <=> :vector)) as score
        FROM documents
        ORDER BY embedding <=> :vector
        LIMIT 100
    ),
    lexical_matches AS (
        SELECT id, 1.0 / (60 + row_number() OVER (ORDER BY ts_rank(text_search_col, websearch_to_tsquery('english', :query)) DESC)) as score
        FROM documents
        WHERE text_search_col @@ websearch_to_tsquery('english', :query)
        ORDER BY ts_rank(text_search_col, websearch_to_tsquery('english', :query)) DESC
        LIMIT 100
    )
    SELECT id, SUM(score) as combined_score
    FROM (
        SELECT * FROM vector_matches
        UNION ALL
        SELECT * FROM lexical_matches
    ) combined
    GROUP BY id
    ORDER BY combined_score DESC
    LIMIT :top_k;
    """)
    
    return db.execute(sql, {"vector": query_vector, "query": query_text, "top_k": top_k}).fetchall()

Architectural Trade-offs

1. Indexing Overhead

Adding a GIN index on your tsvector column is mandatory for performance, but it slows down INSERT and UPDATE operations. If your data changes every second, you’ll notice write contention. I usually recommend a background worker to update the tsvector column rather than doing it in the main transaction block.

2. The Normalization Problem

Raw BM25 scores and vector distances exist on different scales. You cannot simply add them. If you try (vector_score * 0.5) + (bm25_score * 0.5), you will get garbage results because the units aren't compatible. Using RRF (as shown above) is the most robust way to avoid manual weight tuning.

3. Memory usage

If your dataset exceeds a few million rows, keep an eye on your work_mem. The sorting operations for RRF can be memory-intensive. I’ve found that setting a reasonable LIMIT inside the CTEs (as I did with LIMIT 100) prevents the database from trying to rank every single row in the table, which keeps latency under 100ms.

Debugging Tips

Check the Explain Plan: Always run EXPLAIN ANALYZE on your hybrid query. If you see a sequential scan on the vector column instead of an HNSW index usage, your distance operator is likely mismatched with your index type.
Tokenization Mismatch: Ensure the language configuration in websearch_to_tsquery matches your documents. If you have multilingual data, you need to handle tsvector columns per language or use simple configuration, otherwise, stop words will kill your lexical search quality.
Vector Normalization: If you are using cosine similarity, ensure your vectors are normalized before insertion. If they aren't, use cosine_distance (<=>) instead of inner_product (<#>) to ensure the math stays consistent.

Hybrid search isn't just about throwing more tech at the problem; it’s about aligning the retrieval mechanism with how users actually query the system. Keep the logic in the database, keep your indices optimized, and RRF will handle the heavy lifting of merging your scores.