Key takeaways

  1. The first decision isn't which vector DB — it's whether you need a dedicated one at all. Below ~10M vectors on Postgres, pgvector usually wins on cost and simplicity.
  2. pgvector is the cheapest option at almost every scale under ~50M vectors because it rides your existing Postgres; pgvectorscale extends it well beyond that.
  3. Pinecone is the zero-ops managed pick — a free 2GB tier, ~$70/mo at 10M vectors, but cost climbs steeply (~$700+/mo) at 100M.
  4. Qdrant (Rust) offers the best price-performance among dedicated engines — lowest p50 latency and strong filtering, self-hostable for ~$30–50/mo.
  5. Pricing is usage-based and volatile in 2026; benchmark on your own data and verify current rates before committing.

The most important vector-database decision you’ll make in 2026 isn’t which one. It’s whether you need a dedicated one at all. For a large share of teams shipping retrieval-augmented generation today, the honest answer is “not yet” — and the reason that answer is now defensible is the single biggest shift in this category over the last two years.

Embeddings — the numeric vectors models emit to capture meaning — used to demand specialized infrastructure the moment you had more than a toy dataset. (If that sentence needs unpacking, start with our explainer on what an embedding really is.) In 2026 that’s no longer automatic: Postgres can do competent vector search in-process, the managed services have gone serverless, and the open-source engines have matured into genuinely different tools rather than near-clones. The comparison table above has the specifics; this piece is about which one fits you.

First question: do you even need a dedicated vector database?

If you already run Postgres and you’re under roughly 10 million vectors, pgvector is very likely your answer, and reaching for anything else is premature optimization. It’s a Postgres extension, so your embeddings sit in the same database as your relational data — same backups, same transactions, same JOINs — with no second system to operate. It supports the distance metrics that matter (cosine, L2, inner product) and an HNSW index that, at the 1M-vector scale, is competitive with purpose-built engines. On managed Postgres it runs around $45/month at 10M vectors, and it stays the cheapest option at essentially every scale below ~50M because it adds nothing to your stack. When you outgrow it, the pgvectorscale extension pushes the same setup into the tens of millions before you’re forced to migrate.

You reach for a dedicated vector database when one of these becomes true: you need very large scale, you need the lowest achievable latency, you need rich metadata filtering at query time, or you simply don’t want to run the infrastructure. Here’s how the four worth considering differ.

Pinecone — the zero-ops managed default

Pinecone is the one to pick when you want vector search to be someone else’s operational problem. It’s fully managed and serverless: you write and query through an API and never think about nodes, shards, or rebalancing. There’s a free tier (2GB storage, 1M reads/month), and it’s predictable to reason about — roughly $70/month at 10M vectors.

The caveat is the shape of the cost curve. Pinecone’s usage-based pricing (storage plus read/write units) climbs steeply with scale: the same workload at 100M vectors can run $700+/month, where a self-hosted engine or pgvector stays well under $100. That’s the managed-service trade in one number — you pay a premium to never touch a server, and the premium grows with you. For many teams shipping production RAG without a platform team, that’s a fair deal. For cost-sensitive scale, it’s the thing to model carefully.

Qdrant — the best price-performance

Qdrant is the open-source engine to beat on raw efficiency. Written in Rust, it posts the lowest p50 latency of the dedicated engines in 2026 benchmarks (~4ms at 1M vectors, versus 8ms for managed Pinecone), and its standout is first-class payload filtering — fast, precise filtering of results by metadata, which is exactly what production RAG and faceted search lean on. You can self-host it on a small VPS for ~$30–50/month or use Qdrant Cloud ($65/month at 10M vectors). If you want a dedicated engine and care about getting the most per dollar, this is the strongest all-rounder.

Filtering is also where engines quietly diverge: Qdrant’s payload filtering stays fast under load, whereas heavy metadata filters can add noticeable latency on some managed services. If your queries routinely combine “similar to this” with “and matches these constraints,” test that exact pattern — not just raw nearest-neighbour speed — before you decide.

Weaviate — hybrid search, natively

Weaviate (Go) earns its place when your problem isn’t pure vector similarity but a blend of semantic and keyword search. Its native hybrid search combines vector and structured/keyword retrieval in one query, and its modular architecture plugs in embedding models and rerankers. Managed Weaviate Cloud starts around $25/month — the cheapest managed entry point among the major players — and it self-hosts as well. Choose it when rich data modeling and hybrid relevance matter more than squeezing out the last millisecond.

Chroma — the developer-first prototyping option

Chroma optimizes for getting started, not for scale. It’s open-source and can run embedded — in-process, even in-memory — so you can stand up vector search in a notebook or a small app in minutes, with a Rust core under a Python-first API. It’s free, and there’s a managed cloud when you outgrow local. It’s the right call for prototyping, demos, and small-to-mid workloads; it is not where you put a 100M-vector production index.

What pgvector actually looks like

Part of pgvector’s appeal is how little ceremony it takes — vector search is just SQL:

-- Enable the extension
CREATE EXTENSION vector;

-- Store items alongside their embeddings (e.g. 1536-dim from an embedding model)
CREATE TABLE items (
  id        serial PRIMARY KEY,
  name      text,
  embedding vector(1536)
);

-- Approximate-nearest-neighbour index for fast cosine search
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

-- Find the 5 items most similar to a query embedding
SELECT name, 1 - (embedding <=> :query) AS cosine_similarity
FROM items
ORDER BY embedding <=> :query
LIMIT 5;

No new service, no data sync, no second source of truth — the search lives next to the rest of your application data. That is the whole pitch.

What it costs at scale

Cost is where these options diverge most, so it’s worth a concrete pass. At ~10M vectors the field is close: pgvector on managed Postgres runs ~$45/month, Weaviate Cloud starts around $25/month, Qdrant Cloud ~$65/month (or ~$30–50 self-hosted), and Pinecone ~$70/month. The gap opens at scale. By ~100M vectors, Pinecone’s usage-based model can pass $700/month, while a self-hosted engine or pgvector typically stays under $100 — you’re paying for managed convenience, and that premium compounds as you grow. The flip side is real, though: self-hosting trades that bill for engineering time, monitoring, and on-call. Model both lines. The cheaper invoice isn’t always the cheaper system once a person’s time is in the equation.

How to choose in 2026

Match the tool to your real constraints, not to a leaderboard:

  • On Postgres, under ~10M vectors → pgvector. Don’t add infrastructure you don’t need yet.
  • Want zero ops at any scale → Pinecone, and model the cost at your target scale before committing.
  • Want the fastest open-source engine with strong filtering → Qdrant.
  • Need hybrid keyword-plus-vector relevance → Weaviate.
  • Prototyping or small-scale → Chroma.

The real 2026 story

Step back and the trend is clear: the floor of this market is being eaten from below. pgvector and pgvectorscale keep absorbing workloads that used to require a dedicated database, and serverless offerings keep absorbing the operational burden that used to justify a platform team. The dedicated engines haven’t lost — they’ve specialized, competing on latency, filtering, hybrid search, and scale rather than on simply existing. The practical consequence for you is liberating: you can start with the boring default that’s already in your stack, prove the use case, and adopt a specialized engine only when a specific limit — scale, latency, filtering, ops — actually bites. One caveat to carry into any decision: pricing here is usage-based and genuinely volatile, so the figures above are approximate starting points. Benchmark on your own data, at your own recall target, and verify current rates before you commit a workload.

Frequently asked questions

What is a vector database?

A vector database stores and searches embeddings — the numeric vectors AI models produce to represent the meaning of text, images, or audio. Instead of exact matches, it finds the nearest vectors by similarity, which is what powers semantic search, RAG, recommendations, and agent memory. See our explainer on what an embedding actually is for the underlying idea.

Do I need a dedicated vector database, or is pgvector enough?

If you already run Postgres and have under roughly 10 million vectors, pgvector is usually enough and the cheapest, simplest choice — your vectors live next to your relational data with ACID guarantees and JOINs. Reach for a dedicated database (Pinecone, Qdrant, Weaviate) when you need very large scale, the lowest possible latency, advanced filtering, or you want managed infrastructure with no ops.

Which vector database is cheapest?

For most workloads under ~50M vectors, pgvector is cheapest because it piggybacks on existing Postgres (≈$45/mo at 10M vectors on managed Postgres). Among dedicated options, Qdrant has the best price-performance — self-hosting on a small VPS runs ~$30–50/mo. Managed services like Pinecone start free but scale up steeply (~$700+/mo at 100M vectors).

Which vector database is fastest?

In published 2026 benchmarks at 1M vectors (1536 dimensions), Qdrant posts the lowest p50 latency (~4ms), with Pinecone around 8ms as a managed service. pgvector with an HNSW index is competitive at the 1M scale; at tens of millions, the pgvectorscale extension closes much of the gap. Always benchmark on your own data and recall target — results shift with dimensions, filters, and index settings.

Managed cloud or self-hosted?

Choose managed (Pinecone, or the cloud tiers of Qdrant/Weaviate/Chroma) when you want to avoid running infrastructure and can accept usage-based pricing. Self-host the open-source engines (Qdrant, Weaviate, Chroma, pgvector) when you want control, lower cost at scale, or data residency — at the cost of operating the database yourself.

About Aditya Marin Gasga

Founding Editor

Aditya covers the whole AI surface area for Signal — frontier models, agent infrastructure, the economics of inference, and the policy decisions that quietly shape what everyone else can build. He writes for operators who need a calibrated view of what's actually shipping versus what's keynote theatre.

  • Founder of Signal; sets the publication's editorial line
  • A decade across product, growth, and AI tooling at venture-backed startups
  • Reads the model release notes, the system cards, and the benchmark papers — and tells you which ones matter
More from Aditya Marin Gasga →