Pure cosine similarity fails the moment a user searches for "API v2.3 rate limit" or a product SKU — the embedding model can't tell exact identifiers apart from semantically adjacent noise. The fix is two indexes, one CTE, and a clear rule for when to trust each leg.
This is the article for the team that already runs Postgres, added pgvector six months ago, shipped "semantic search," and now has a Linear ticket titled "search returns wrong version of the docs." You don't need a second database. You need a keyword leg.
Why pure vector retrieval breaks on exact queries
Cosine distance (<=>) finds vectors in the same neighborhood. That works beautifully for natural-language prose: "how do I cancel a subscription" lands near "subscription cancellation flow" even though they share two words. It breaks the moment the query is short and identifier-heavy.
Search for "PostgreSQL 17 performance improvements" against a corpus of release notes and you'll get hits for 14, 15, and 16. The embedding model treats "17" as weak signal — version numbers, SKUs, error codes, and proper nouns all collapse into broad neighborhoods. Short queries provide limited context, so unrelated items look semantically similar.
This is not a pgvector bug. It's a property of dense retrieval. A better embedding model won't fix it. A second retrieval leg will.
The hybrid search SQL
Two indexes. One CTE. Reciprocal Rank Fusion to merge the results.
ALTER TABLE docs ADD COLUMN IF NOT EXISTS embedding vector(1536);
ALTER TABLE docs ADD COLUMN IF NOT EXISTS tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
CREATE INDEX docs_embedding_hnsw
ON docs USING hnsw (embedding vector_cosine_ops);
CREATE INDEX docs_tsv_gin ON docs USING gin (tsv);Now the query. Both legs run in parallel, each produces a rank, RRF sums 1.0 / (k + rank) across the two. Lower distance is better, higher ts_rank is better — but RRF operates on rank positions, so the score-direction mismatch doesn't matter.
WITH params AS (
SELECT $1::vector AS query_vec,
websearch_to_tsquery('english', $2) AS query_tsq,
60::int AS k
),
vector_leg AS (
SELECT id, ROW_NUMBER() OVER () AS rank
FROM (
SELECT id
FROM docs
ORDER BY embedding <=> (SELECT query_vec FROM params)
LIMIT 50
) top_vec
),
fts_leg AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, (SELECT query_tsq FROM params)) DESC) AS rank
FROM docs
WHERE tsv @@ (SELECT query_tsq FROM params)
ORDER BY ts_rank(tsv, (SELECT query_tsq FROM params)) DESC
LIMIT 50
)
SELECT d.id, d.title,
COALESCE(1.0 / ((SELECT k FROM params) + v.rank), 0) +
COALESCE(1.0 / ((SELECT k FROM params) + f.rank), 0) AS rrf_score
FROM (
SELECT id FROM vector_leg
UNION
SELECT id FROM fts_leg
) candidates
JOIN docs d USING (id)
LEFT JOIN vector_leg v USING (id)
LEFT JOIN fts_leg f USING (id)
ORDER BY rrf_score DESC
LIMIT 10;That's the whole thing. Two parameters: a query vector and a query string. Both legs get capped at 50 candidates — RRF only cares about rank within the candidate set, and 50 is plenty for a top-10 result.
A benchmark scaffold you can run locally
Don't trust my claim that hybrid wins on mixed workloads. Run it.
CREATE TABLE docs (
id bigserial PRIMARY KEY,
title text,
body text,
embedding vector(1536),
tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED
);
INSERT INTO docs (title, body, embedding)
SELECT
'doc ' || i,
'PostgreSQL ' || (12 + (i % 6)) || ' notes: ' || md5(i::text),
(SELECT array_agg(random())::vector FROM generate_series(1, 1536))
FROM generate_series(1, 100000) i;
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON docs USING gin (tsv);
EXPLAIN (ANALYZE, BUFFERS)
WITH /* paste the hybrid CTE here */ ...;Run EXPLAIN ANALYZE for three query shapes against your own corpus: a pure natural-language query, an identifier-heavy query ("v2.3", "ERR_TIMEOUT"), and a mixed one. The crossover is usually obvious within five queries.
Which leg to weight
RRF with k=60 is the right starting point. It's the convention from the original IR paper — not a value tuned for your data. Two adjustments are worth making once you've seen the benchmark output:
| Corpus type | Query style | Bias toward | RRF k to try |
|---|---|---|---|
| Terminology-dense (API docs, legal, medical, changelogs) | Exact identifiers, codes, versions | FTS leg (weight 0.6 / 0.4) | 20–40 |
| Terminology-dense | Natural language | Balanced | 60 |
| Prose / narrative | Exact identifiers | FTS leg (weight 0.7 / 0.3) | 20–40 |
| Prose / narrative | Natural language | Vector leg (weight 0.6 / 0.4) | 60–80 |
To weight the legs, multiply each 1.0 / (k + rank) term by a coefficient before summing. Lower k biases the merge toward top-ranked results from each leg, which helps when documents are short (product titles, support ticket subjects) and the top-3 is what actually matters.
You'll see "x% better answer quality" cited for hybrid in vendor posts. The numbers are from Microsoft's research on their own corpora — directional, not transferable. The honest claim is narrower: hybrid consistently beats either leg alone on mixed query workloads. The magnitude depends on your data.
RRF is the safest default because it doesn't require score normalization. It's a floor, not a ceiling. If your query types are predictable and your corpus is well-characterized, a tuned linear combination with proper score inversion and min-max normalization will outperform it. Start with RRF. Tune later when you have query logs to learn from.
Skip the second database
The common pushback when you propose hybrid is "this is getting complex — let's run Qdrant or Weaviate alongside Postgres." That's backwards for a team already on Postgres.
A 40-line CTE with two indexes is not complexity. Running, backing up, monitoring, securing, and version-upgrading a second database is. For 95% of Postgres shops, the operational overhead of a dedicated vector store dwarfs the SQL you just read.
The greenfield case — millions of vectors, no existing schema, vectors as the primary access pattern — is a real conversation. That's not the team reading this article.
Add the tsvector column and GIN index to your schema today, drop in the RRF CTE, and run EXPLAIN ANALYZE against a mixed query set before anyone opens a Qdrant tab. If the hybrid query is under 50ms on your real data, the second-database conversation is over.