Vector Databases in Production: When You Actually Need One

There’s a default that quietly costs teams a lot of money: “we’re building RAG, so we need a vector database.” It gets written into architecture diagrams in week one, before anyone has measured a single query. Then six months later the team is operating a second datastore new connection pools, new monitoring, new backup procedures, a separate bill to serve a few hundred thousand vectors that Postgres would have handled without noticing.

This piece is the no-hype version of the decision. When is a dedicated vector database the right call, when is it overkill, and what actually breaks at scale that should trigger the move? The answer in 2026 has shifted, and the shift favors not adding infrastructure until you can justify it.

First principle: a vector index is a feature, not necessarily a system

A “vector database” does two things: store embeddings and run approximate-nearest-neighbor (ANN) search over them, usually with an HNSW or IVF index. That capability no longer requires a dedicated product. Postgres, via the pgvector extension, now offers it inside the database most teams already run with the indexing, recall tuning, and (with pgvectorscale) the throughput to match purpose-built stores at moderate scale.

So the real question isn’t “which vector database?” It’s “do I need a separate one at all, or is this a feature I switch on in infrastructure I already operate?” Frame it that way and the bar for adding a system gets appropriately high because every new datastore is a permanent operational tax, not a one-time setup.

The honest 2026 comparison

Dimension	pgvector (Postgres)	Dedicated vector DB (Pinecone / Qdrant / Milvus)
Best when	You already run Postgres; vectors relate to relational data	Vector search is the primary, high-scale workload
Comfortable scale	Single-digit millions; ~50M with pgvectorscale	Tens of millions to billions
Latency at scale	Good to great on right hardware	Tuned for sub-50ms p99 at high QPS
Hybrid queries (vector + SQL filters)	Native one query, with joins/foreign keys	Often filter-then-vector; can blow up latency
Consistency	ACID; vectors participate in transactions	Usually eventual; separate consistency model
Multi-tenancy / isolation	Manual; shares resources with relational load	Often first-class; isolates vector workload
Ops burden	None new your DBA already runs Postgres	A second system to deploy, monitor, back up, scale
Cost shape	Folded into existing Postgres spend	Usage-based; competitive low, escalates past ~10M vectors / high QPS

Two rows carry most of the decision: hybrid queries and ops burden. If your retrieval is really “find similar items that are also in stock, under $50, and belong to this tenant,” that’s a join and doing it as one SQL query in Postgres beats stitching two systems together. And the ops burden is the cost that never shows up in the proof-of-concept but dominates the second year.

When pgvector is the right answer (most of the time)

Reach for pgvector and stop there when:

You already run Postgres. Adding an extension is dramatically cheaper than operating a new datastore. No new connection pools, monitoring, or backup runbooks.
You’re under ~10M vectors (and pgvectorscale stretches this toward ~50M). Most applications never exceed this. Be honest about your real trajectory, not your pitch-deck one.
Vectors are coupled to relational data. Embeddings reference rows; you need joins, foreign keys, and filtered similarity in a single query. This is pgvector’s home turf and a genuine weakness of standalone stores.
ACID matters. When a vector update must be atomic with a relational update rollbacks, consistency Postgres’s transactional model is an advantage, not a constraint.
Query volume is moderate. Vector count matters less than QPS. Ten million vectors at 100 queries/minute is a completely different problem than one million at 10,000 QPS.

The 2026 benchmark reality has retired the old “pgvector is too slow” objection: with pgvectorscale, Postgres has been measured at hundreds of QPS at 99% recall on tens of millions of vectors competitive with dedicated stores for the workloads most teams actually have (multiple independent 2026 benchmarks). The slow-pgvector assumption is two years out of date.

When a dedicated vector database earns its place

Switch or add one alongside Postgres when you can name the specific bottleneck. Vague “we might scale” is not a bottleneck; these are:

Scale past pgvector’s comfort zone. Beyond ~50–100M vectors, HNSW index rebuild times in Postgres become a real operational constraint. This is the cleanest trigger.
Strict latency SLAs at high QPS. If you need consistent sub-50ms p99 under heavy concurrent load, purpose-built engines with execution paths tuned for that will hold the line more predictably.
Workload isolation. In pgvector, heavy vector queries share CPU, memory, and I/O with your relational workload large index builds compete with production queries. A dedicated store isolates that, giving predictable performance to both systems. This is an underrated reason that has nothing to do with raw vector count.
First-class multi-tenancy. Millions of vectors per tenant with cross-tenant search patterns is where dedicated stores’ tenancy features pay off.
Specialized capabilities. Sub-vector product quantization, dynamic-edge-pruning HNSW, late-interaction retrieval if you genuinely need these, that’s a real reason, not a checkbox.

The hidden ops cost of either path (the part the vendors skip)

The calendar promised the hidden costs, so here they are plainly:

pgvector’s hidden costs:

Index builds consume shared memory and compete with production queries; large HNSW builds need a tuning and maintenance-window strategy.
Heavy vector load can degrade your relational performance a coupling risk you must capacity-plan for.
Scaling reads means scaling Postgres (replicas), which carries its own cost curve.

Dedicated vector DB’s hidden costs:

It’s a whole second system: deploy, monitor, secure, back up, upgrade, and staff. That operational tax is permanent.
Usage-based pricing looks cheap in the POC and can escalate sharply once the system succeeds a 10M-query/month RAG app is where the bill surprises people.
Data sync and consistency between your source of truth (often Postgres) and the vector store becomes your problem a pipeline to build, monitor, and reconcile.

The mistake isn’t choosing one or the other. It’s choosing either without pricing its real second-year ops cost. The cheapest system is usually the one you don’t have to operate.

A pattern worth knowing: don’t migrate, augment

The most common production endgame at scale isn’t “rip out Postgres.” It’s Postgres for writes and consistency, a dedicated store for read scaling the system of record stays transactional in Postgres, embeddings get replicated to a Qdrant/Milvus read layer tuned for high-throughput similarity search. You get ACID where it matters and scale where you need it, at the cost of a sync pipeline you now own. That trade pipeline complexity for performance isolation is the right one only when a named bottleneck justifies it.

Common mistakes

Adopting a dedicated vector DB in week one. Most teams over-engineer this before measuring anything. Start on Postgres; move when something breaks.
Migrating on benchmarks instead of bottlenecks. A blog’s QPS number isn’t your workload. Decide on your scale, latency SLA, and query shape.
Ignoring hybrid-query needs. If retrieval is inseparable from relational filters, a standalone store quietly forces you into two-system gymnastics.
Forgetting the sync pipeline. Two datastores means a consistency problem. Budget for the pipeline, monitoring, and reconciliation not just the database.
Pricing the POC, not the success case. Usage-based costs scale with adoption. Model the bill at the volume you’re hoping to reach.

Conclusion

In 2026, the default should flip. Don’t start by choosing a vector database start by asking whether you need a separate one at all. For most teams already on Postgres, pgvector serves production RAG well into the tens of millions of vectors while reusing infrastructure you already trust. Graduate to a dedicated store when you can point at the specific thing that broke: scale beyond ~50M, strict p99 at high QPS, workload isolation, or real multi-tenancy.

Architecture maturity isn’t using the most specialized tool. It’s using the least specialized tool that meets the requirement and knowing exactly which requirement would force you to upgrade.

CTA

Trying to decide if your RAG workload has actually outgrown Postgres — or if a dedicated vector DB would just be expensive ops overhead? That’s a sizing question with a real answer.

Get Architecture Advice → we’ll size your vector workload against your real scale, latency SLA, and query shape, and tell you honestly whether pgvector holds or a dedicated store earns its cost. No rip-and-replace by default.

FAQs

Usually not a dedicated one. If you already run Postgres and have under ~10M vectors (up to ~50M with pgvectorscale), pgvector handles production RAG while reusing infrastructure you already operate. A standalone vector database is worth it once you hit a specific bottleneck like very large scale or strict latency SLAs.

When you can name the bottleneck: roughly beyond 50–100M vectors (HNSW index rebuild times become constraining), strict sub-50ms p99 latency at high QPS, heavy multi-tenancy with cross-tenant search, or the need to isolate vector load from your relational workload. Vague future growth isn’t a reason; a measured limit is.

Yes, for most workloads. With the pgvectorscale extension, Postgres has been benchmarked at hundreds of QPS at 99% recall on tens of millions of vectors competitive with dedicated stores. The “pgvector is too slow” claim reflects older versions and no longer holds for typical production RAG.

It depends on scale. Dedicated managed services can be cost-competitive at low volume, but usage-based pricing tends to escalate past ~10M vectors or high query volume. pgvector’s cost folds into existing Postgres spend. The honest comparison is total cost of ownership including the ops time a separate system consumes.

Yes, and it’s a common endgame at scale: Postgres for writes and consistency, a dedicated store (Qdrant, Milvus) as a read-scaling layer for high-throughput similarity search. You gain performance isolation at the cost of building and maintaining a sync pipeline between the two worth it only when a real bottleneck justifies it.

Adopting a dedicated vector database in week one, before measuring anything. Most teams over-engineer this decision and end up operating a second datastore they didn’t need. Start on Postgres with pgvector and migrate only when a specific, measured limit forces the move.

First principle: a vector index is a feature, not necessarily a system

The honest 2026 comparison

When pgvector is the right answer (most of the time)

When a dedicated vector database earns its place

The hidden ops cost of either path (the part the vendors skip)

A pattern worth knowing: don’t migrate, augment

Common mistakes

Conclusion

CTA

FAQs

Do I need a vector database for RAG?+

When does pgvector stop being enough?+

Is pgvector actually fast enough for production in 2026?+

Is Pinecone more expensive than pgvector?+

Can I use both Postgres and a dedicated vector database together?+

What’s the most common vector-database mistake?+

Related Posts