pgvector Hybrid Search: RRF Beats Pure Vector

Pure cosine similarity fails the moment a user searches for "API v2.3 rate limit" or a product SKU — the embedding model can't tell exact identifiers apart from semantically adjacent noise. The fix is two indexes, one CTE, and a clear rule for when to trust each leg.

This is the article for the team that already runs Postgres, added pgvector six months ago, shipped "semantic search," and now has a Linear ticket titled "search returns wrong version of the docs." You don't need a second database. You need a keyword leg.

Why pure vector retrieval breaks on exact queries

Cosine distance (<=>) finds vectors in the same neighborhood. That works beautifully for natural-language prose: "how do I cancel a subscription" lands near "subscription cancellation flow" even though they share two words. It breaks the moment the query is short and identifier-heavy.

Search for "PostgreSQL 17 performance improvements" against a corpus of release notes and you'll get hits for 14, 15, and 16. The embedding model treats "17" as weak signal — version numbers, SKUs, error codes, and proper nouns all collapse into broad neighborhoods. Short queries provide limited context, so unrelated items look semantically similar.

This is not a pgvector bug. It's a property of dense retrieval. A better embedding model won't fix it. A second retrieval leg will.

The hybrid search SQL

Two indexes. One CTE. Reciprocal Rank Fusion to merge the results.

One-time schema setup

ALTER TABLE docs ADD COLUMN IF NOT EXISTS embedding vector(1536);
ALTER TABLE docs ADD COLUMN IF NOT EXISTS tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;

CREATE INDEX docs_embedding_hnsw
  ON docs USING hnsw (embedding vector_cosine_ops);

CREATE INDEX docs_tsv_gin ON docs USING gin (tsv);

Now the query. Both legs run in parallel, each produces a rank, RRF sums 1.0 / (k + rank) across the two. Lower distance is better, higher ts_rank is better — but RRF operates on rank positions, so the score-direction mismatch doesn't matter.

Hybrid search with RRF — copy-paste and edit the two parameters at the top

WITH params AS (
  SELECT $1::vector AS query_vec,
         websearch_to_tsquery('english', $2) AS query_tsq,
         60::int AS k
),
vector_leg AS (
  SELECT id, ROW_NUMBER() OVER () AS rank
  FROM (
    SELECT id
    FROM docs
    ORDER BY embedding <=> (SELECT query_vec FROM params)
    LIMIT 50
  ) top_vec
),
fts_leg AS (
  SELECT id,
         ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, (SELECT query_tsq FROM params)) DESC) AS rank
  FROM docs
  WHERE tsv @@ (SELECT query_tsq FROM params)
  ORDER BY ts_rank(tsv, (SELECT query_tsq FROM params)) DESC
  LIMIT 50
)
SELECT d.id, d.title,
       COALESCE(1.0 / ((SELECT k FROM params) + v.rank), 0) +
       COALESCE(1.0 / ((SELECT k FROM params) + f.rank), 0) AS rrf_score
FROM (
  SELECT id FROM vector_leg
  UNION
  SELECT id FROM fts_leg
) candidates
JOIN docs d USING (id)
LEFT JOIN vector_leg v USING (id)
LEFT JOIN fts_leg    f USING (id)
ORDER BY rrf_score DESC
LIMIT 10;

That's the whole thing. Two parameters: a query vector and a query string. Both legs get capped at 50 candidates — RRF only cares about rank within the candidate set, and 50 is plenty for a top-10 result.

A benchmark scaffold you can run locally

Don't trust my claim that hybrid wins on mixed workloads. Run it.

Minimum seed + timing harness

CREATE TABLE docs (
  id        bigserial PRIMARY KEY,
  title     text,
  body      text,
  embedding vector(1536),
  tsv       tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED
);

INSERT INTO docs (title, body, embedding)
SELECT
  'doc ' || i,
  'PostgreSQL ' || (12 + (i % 6)) || ' notes: ' || md5(i::text),
  (SELECT array_agg(random())::vector FROM generate_series(1, 1536))
FROM generate_series(1, 100000) i;

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON docs USING gin (tsv);

EXPLAIN (ANALYZE, BUFFERS)
WITH /* paste the hybrid CTE here */ ...;

Run EXPLAIN ANALYZE for three query shapes against your own corpus: a pure natural-language query, an identifier-heavy query ("v2.3", "ERR_TIMEOUT"), and a mixed one. The crossover is usually obvious within five queries.

Which leg to weight

RRF with k=60 is the right starting point. It's the convention from the original IR paper — not a value tuned for your data. Two adjustments are worth making once you've seen the benchmark output:

Corpus type	Query style	Bias toward	RRF `k` to try
Terminology-dense (API docs, legal, medical, changelogs)	Exact identifiers, codes, versions	FTS leg (weight 0.6 / 0.4)	20–40
Terminology-dense	Natural language	Balanced	60
Prose / narrative	Exact identifiers	FTS leg (weight 0.7 / 0.3)	20–40
Prose / narrative	Natural language	Vector leg (weight 0.6 / 0.4)	60–80

To weight the legs, multiply each 1.0 / (k + rank) term by a coefficient before summing. Lower k biases the merge toward top-ranked results from each leg, which helps when documents are short (product titles, support ticket subjects) and the top-3 is what actually matters.

You'll see "x% better answer quality" cited for hybrid in vendor posts. The numbers are from Microsoft's research on their own corpora — directional, not transferable. The honest claim is narrower: hybrid consistently beats either leg alone on mixed query workloads. The magnitude depends on your data.

RRF is the safest default because it doesn't require score normalization. It's a floor, not a ceiling. If your query types are predictable and your corpus is well-characterized, a tuned linear combination with proper score inversion and min-max normalization will outperform it. Start with RRF. Tune later when you have query logs to learn from.

Skip the second database

The common pushback when you propose hybrid is "this is getting complex — let's run Qdrant or Weaviate alongside Postgres." That's backwards for a team already on Postgres.

A 40-line CTE with two indexes is not complexity. Running, backing up, monitoring, securing, and version-upgrading a second database is. For 95% of Postgres shops, the operational overhead of a dedicated vector store dwarfs the SQL you just read.

The greenfield case — millions of vectors, no existing schema, vectors as the primary access pattern — is a real conversation. That's not the team reading this article.

Add the tsvector column and GIN index to your schema today, drop in the RRF CTE, and run EXPLAIN ANALYZE against a mixed query set before anyone opens a Qdrant tab. If the hybrid query is under 50ms on your real data, the second-database conversation is over.