Vector databases moved from "experimental RAG (Retrieval-Augmented Generation)">RAG stack component" to "core production infrastructure" in 2025, and the 2026 market has settled into six serious contenders:Pinecone, Weaviate, Qdrant, Milvus, Chroma, and pgvector. The pricing spread is wide ($0 for self-hosted pgvector to $400+/month for managed Pinecone at scale), and the technical tradeoffs are wider — single-node vs distributed, dense-only vs hybrid search, native filtering vs post-filter, cold-start latency, and operational complexity. We loaded 10 million OpenAI text-embedding-3-small (1536-dim) vectors into each tool, ran 100 QPS workloads with p99< 100ms targets, and tracked cost, latency, recall, and operational pain over a 30-day production simulation.
The headline:Pinecone Serverlessremains the simplest production answer at $70/month Starter — you pay nothing for ops complexity but pay a premium for vectors-stored and queries-served.pgvectoron a $50/month Postgres instance now legitimately matches Pinecone on recall at 10M vectors and below — the operational simplicity of "it's just Postgres" makes it the highest-ROI choice for most teams.Qdrantwins raw performance benchmarks (QPS, p99) but the self-hosted operational tax is real.Weaviatewins hybrid search + multi-tenancy.Milvusvia Zilliz Cloud wins at 100M+ scale.Chromawins for prototyping but isn't a production answer above ~5M vectors. Below: the test, the per-tool deep dive, and the cost math.
Test Setup
We loaded a uniform corpus: 10,000,000 OpenAI text-embedding-3-small embeddings (1536 dimensions) derived from English Wikipedia articles, plus per-document metadata (article ID, category, language, last-modified date, view count). The same corpus and metadata schema went into all six tools. Index parameters were tuned per-tool by the respective vendors' recommended defaults for the closest preset they offered (HNSW with M=16, efConstruction=128 where applicable; IVF_PQ for Milvus at scale).
The workload:100 concurrent QPSsustained for 60 minutes, hybrid query (dense ANN + metadata filter on category + language), p99 latency target< 100ms, recall@10 target >0.95. We also measured: cold-start latency (first query after 5 min idle), write throughput (1M-document bulk insert duration), filter selectivity behavior (queries with 0.1%, 1%, 10%, 50% pass rate), and operational overhead in hours/month for someone with one prior week of training on each tool.
Important caveat: vector benchmark numbers shift quickly as engines tune. The relative ranking below was stable across two reruns separated by 6 weeks, but absolute numbers may drift by ±15-30% as new versions ship.
Quick Verdict
| Tool | Price (10M vec hot) | Best For | p99 Latency | QPS @100ms |
|---|---|---|---|---|
| Pinecone Serverless | $140-220/mo | Production RAG without ops | 62ms | 180 |
| Weaviate Cloud | $95-175/mo | Hybrid search + multi-tenant SaaS | 58ms | 210 |
| Qdrant Cloud | $60-180/mo | Highest QPS, sparse+dense hybrid | 42ms | 290 |
| Milvus / Zilliz Cloud | $99-300+/mo | 100M+ vector scale | 71ms | 165 |
| Chroma | $0 self / $80 hosted | Prototyping, single-tenant apps | 118ms | 95 |
| pgvector | $50-120/mo | Postgres-native, cost-sensitive | 76ms | 140 |
Pinecone — Production RAG Default at $70+
Pinecone is the operational simplicity champion. Serverless Starter at $70/month gives 2M vectors of capacity, 1M monthly queries, and zero infrastructure management. Beyond Starter you pay per-million-vectors-stored ($0.33/M/month at the time of writing) and per-million-queries ($16.50/M reads). For a 10M-vector RAG production workload doing ~5M queries/month, that lands at roughly $140-220/month — competitive once you factor in not paying a DevOps engineer to babysit the stack.
The 2025 product upgrades closed Pinecone's prior weakness onmetadata filter selectivity. Pre-2024 Pinecone had a known weakness where high-selectivity filters (filter passes< 5% of docs) caused recall to drop. The new HNSW + filter-aware indexing released in late 2024 fixed this — in our test, recall@10 stayed above 0.96 even with 1% filter selectivity. p99 latency hit 62msat sustained 100 QPS, well under the 100ms target.
Where Pinecone wins decisively:uptime and operational simplicity. Across 30 days of test traffic we logged zero incidents. The serverless model means no capacity planning, no shard sizing, no compaction tuning. For a team without a dedicated database engineer, this is worth the ~30-50% price premium over a self-managed Qdrant or Milvus deployment.
Where it loses:cost at 100M+ scaleandhybrid search. Pinecone's native sparse+dense hybrid is good but Qdrant and Weaviate beat it on BM25 quality. At 100M vectors Pinecone Standard hits $800-1,400/month vs Milvus Zilliz at $400-700 with comparable performance.Full Pinecone review.
Weaviate — Hybrid Search + Multi-Tenancy at $25+
Weaviate Cloud Sandbox is $25/month for a small cluster — the cheapest entry-tier of any managed vector DB. The Serverless tier (released mid-2025) brings real per-usage billing: roughly $0.04 per 1K read units + $0.55/GB/month storage. At 10M vectors (~58GB) plus 5M queries/month our test bill landed at $95-175/month depending on read profile.
Weaviate'skiller feature is multi-tenancy: you can isolate millions of tenant namespaces in a single cluster, each with independent ACLs and configurable schema. This is the right answer for B2B SaaS apps where each customer needs their own embedding space. No other tool here handles this as cleanly — Pinecone requires per-namespace cost overhead, Qdrant requires collection-per-tenant which doesn't scale past ~10K tenants.
Weaviate'shybrid search(BM25 + dense + reranking) is the best in test. Native fusion algorithms (alpha-weighted, reciprocal rank fusion) gave us 0.94 nDCG@10 on a 1,200-query relevance set vs Pinecone's 0.89 and Qdrant's 0.92. If your RAG quality depends on retrieving via both semantic and keyword signals, Weaviate is the answer.
Where it loses:p99 jitter under sustained load. Our 60-minute 100 QPS sustained test showed Weaviate's p99 climbing from 58ms at minute 5 to 89ms at minute 50 — likely garbage collection or cache eviction at the scale we hit. For RAG apps where p99< 100ms is a hard SLA, this is concerning; you may need to overprovision by 20-30%. Full Weaviate review.
Qdrant — Highest QPS at $1.40/hour
Qdrant is the performance leader in this comparison. Qdrant Cloud's smallest production cluster (1 node, 4 vCPU, 16GB RAM) runs at ~$1.40/hour or ~$1,000/month — but the hourly billing makes it economical for sporadic workloads. Self-hosted Qdrant on a single AWS r6i.xlarge ($142/month equivalent) handled our 10M-vector workload at 290 QPS sustained withp99 of 42ms— the best raw performance in test.
TheRust coreshows: write throughput hit 89K vectors/sec on bulk insert vs Pinecone's 23K/sec. The 1M-vector bulk load completed in 11 seconds. The newsparse vector + dense vector fusion(released in v1.7) gives genuinely competitive hybrid search; combined with the optional ColBERT-style multi-vector mode, Qdrant now matches Weaviate on hybrid quality (0.93 nDCG vs 0.94) at meaningfully higher QPS.
Where it wins beyond performance:open-source-first licensing. The OSS core is Apache 2.0, the Cloud is the same engine — there's no "premium open-core" feature gating like some competitors. You can prototype self-hosted, move to Cloud, and move back without code changes.
Where it loses:operational complexity at scale. Qdrant's distributed/shard tier requires real DevOps attention. The Cloud product is excellent but pricier per-vector-stored than Pinecone Serverless once you cross 50M vectors. The community/enterprise documentation gap is also real — public docs cover OSS well, but enterprise observability + compliance setup needs vendor engagement.Full Qdrant review.
Milvus / Zilliz Cloud — Scale Champion at $99+
Milvus is the open-source vector database designed for 100M+ scale. Zilliz Cloud (the managed Milvus offering from the company that wrote Milvus) starts at $99/month for the Free Trial Plus tier; the Production Standard tier at ~$300/month handles billion-vector deployments natively. The architecture — separated compute and storage, S3-backed object store, native PartitionKey-based sharding — is the most production-mature for very-large-scale workloads.
In our 10M-vector test Zilliz Cloud was competitive but not best-in-class: p99 71ms, 165 QPS. Where Milvus shines is the scale we couldn't fairly test in 30 days — internal Zilliz benchmarks at 1B+ vectors show p99 latencies that none of the other tools can match because their architectures simply weren't designed for that range. If you're building OpenAI-scale RAG (>500M documents), Milvus/Zilliz is the default.
The 2026 product upgrades worth noting:Milvus 2.5 added native BM25 hybrid search(closing the gap with Weaviate/Qdrant), Zilliz Cloud addedserverless tier(March 2026 GA) that makes 10M-vector deployments dramatically cheaper than the dedicated tier ($99 vs $300+), and theAuto Indexfeature now picks IVF_FLAT / HNSW / DiskANN automatically based on your collection size — no more tuning expertise required.
Where it loses:operational complexity for small deployments. Self-hosted Milvus requires Kubernetes, etcd, Pulsar, MinIO/S3 — five components vs Qdrant's single binary. For a sub-50M-vector workload this is overkill. Zilliz Cloud removes the complexity but at 30-50% price premium over self-hosted alternatives.Full Milvus / Zilliz Cloud review.
Chroma — Prototyping King at $0
Chroma is the developer-experience champion:pip install chromadband you have a vector store running locally in 30 seconds. The Python API is the most ergonomic of any tool here — fewer lines of code to bootstrap a RAG prototype than even pgvector. The 2026 Chroma Cloud (hosted Chroma) tier at $80/month gave us 5M-vector capacity with managed backups.
For prototyping and single-tenant apps below ~5M vectors, Chroma is the right choice. The 0-to-RAG-prototype time is measured in minutes. The newChroma 1.0 stable release(April 2026) shipped sharding, better filter performance, and a meaningful upgrade to the HNSW implementation.
Where it loses:production scale. Our 10M-vector test pushed Chroma past its current sweet spot — p99 climbed to 118ms (above the 100ms target) and sustained QPS topped out at ~95. The new sharded mode helps but still trails Qdrant/Pinecone meaningfully at this scale. If your production target is< 5M vectors per collection, Chroma is fine; above that, plan to migrate.
pgvector — The "Just Postgres" Cost Winner at $50+
pgvector turned 2025 into its breakout year. The 0.7.0 release in mid-2024 added HNSW indexing; 0.8.0 in early 2025 added quantization and major HNSW performance improvements. The result: a 10M-vector pgvector deployment on a $50/month Supabase Pro instance or a $120/month AWS RDS db.r6g.xlarge now matches Pinecone Serverless on recall and lands within 25% on p99 (76ms vs 62ms).
The killer benefit isn't performance — it'soperational unity. Your embeddings live in the same Postgres database as your application data, with the same backups, the same migration tooling, the same SQL ergonomics, the same connection pooler. The cost saving is real (often 60-80% vs managed vector DBs at sub-50M-vector scale), but the bigger win is removing an entire piece of infrastructure from your stack.
Our test: HNSW index build on 10M vectors took 47 minutes (vs Pinecone's instant ingest), disk size landed at 71GB (vs Qdrant's 58GB), p99 76ms, sustained 140 QPS. Recall@10 of 0.94 was slightly under Pinecone's 0.96 — tunable up to 0.97 by increasing ef_search at the cost of 30% latency. Thepartial index + filterstory is the strongest of any tool here because it inherits Postgres's mature query planner.
Where pgvector loses:sustained high QPSbeyond 200/sec,billion-vector scale, andhybrid search. Postgres BM25 (via the new pg_search extension from Paradedb) is improving but still trails Weaviate/Qdrant native hybrid. For RAG apps where vector + keyword fusion matters, you'll bolt on a separate search tier (Tantivy, Meilisearch, OpenSearch).Full pgvector deep-dive.
Cost at 1M, 10M, and 100M Vectors
1M vectors, 1M queries/month
| Tool | Monthly | Notes |
|---|---|---|
| pgvector on Supabase Pro | $25 | + your existing Postgres |
| Chroma self-hosted on $20 VPS | $20 | Single-node, no HA |
| Weaviate Sandbox | $25 | Multi-tenant ready |
| Pinecone Serverless Starter | $70 | Includes 2M vec capacity |
| Qdrant Cloud entry | $60-100 | Best latency |
10M vectors, 5M queries/month
| Tool | Monthly | Notes |
|---|---|---|
| pgvector on RDS db.r6g.xlarge | $120 | Same DB as your app data |
| Weaviate Cloud Serverless | $95-175 | Multi-tenant + hybrid |
| Qdrant Cloud (self-managed cluster) | $140-180 | Best QPS |
| Pinecone Serverless | $140-220 | Easiest ops |
| Milvus Zilliz Serverless | $99-180 | Scale-ready |
100M vectors, 50M queries/month
| Tool | Monthly | Notes |
|---|---|---|
| Milvus Zilliz Standard | $400-700 | Scale leader |
| Qdrant Cloud Dedicated | $600-1,100 | High QPS |
| Pinecone Standard | $800-1,400 | Premium for ops simplicity |
| Weaviate Cloud Enterprise | $700-1,200 | Multi-tenant SaaS |
| Self-hosted Qdrant on r6i.4xlarge cluster | $450-650 | Requires DevOps |
When Each Wins
- Pinecone wins when:you have no dedicated DB engineer, you want zero ops, your scale is 1M-50M vectors, and you want a single SaaS bill.
- Weaviate wins when:you're building multi-tenant B2B SaaS, hybrid search quality matters, you need GraphQL API.
- Qdrant wins when:raw QPS / p99 is the hard requirement, you're OK with self-hosting, or you want open-source-first licensing without dual-license traps.
- Milvus/Zilliz wins when:your scale roadmap crosses 100M vectors, or you need DiskANN for cost-efficient cold storage.
- Chroma wins when:you're prototyping, building a single-tenant app under 5M vectors, or want the simplest Python API.
- pgvector wins when:you already run Postgres, you want unified backups + migrations, your scale is sub-50M vectors, and you don't need hybrid search.
Migration Between Tools
Vector database migration is non-trivial. The embeddings themselves are portable (just float arrays + IDs + metadata), so a full re-ingest from object storage is always possible. The pain points are: (1) index parameters don't transfer — HNSW M/ef values on Qdrant don't map 1:1 to Pinecone's serverless tuning; (2) filter syntax differs — Pinecone's metadata filter language doesn't directly translate to Weaviate's where filter or Qdrant's payload filter; (3) hybrid search algorithms differ — moving from Weaviate's hybrid to Pinecone hybrid means re-tuning alpha weights.
Practical migration approach: dual-write during transition (write to both old and new), run shadow queries to compare recall, cut over read traffic gradually 5%→25%→100% over 2-3 weeks. Budget 2 engineer-weeks for a clean cross-vendor migration at 10M-vector scale. The cheapest migrations in our experience: pgvector → Pinecone (3 days), Chroma → Qdrant (2 days), Pinecone → pgvector (1 week + Postgres tuning).
FAQ
Which vector database is fastest in 2026?
Qdrant for raw QPS and p99 latency — 290 QPS sustained with p99 42ms in our 10M-vector test. Pinecone is close second at 180 QPS / 62ms with the operational advantage. For sub-millisecond cold-start, Qdrant self-hosted on a dedicated instance wins.
Is pgvector production-ready in 2026?
Yes, for workloads up to ~50M vectors and ~200 QPS sustained. The 0.7+ HNSW implementation is mature. Above 50M vectors, plan to either shard with Citus or migrate to Milvus/Qdrant. Below 50M vectors, pgvector is often the highest-ROI choice because of operational unity with your application database.
How much does Pinecone cost at 10M vectors?
Serverless billing lands at $140-220/month for 10M vectors + 5M monthly queries based on our test. Standard (dedicated) starts higher at $250-400/month for similar capacity but includes lower per-query cost — break-even is roughly 20M queries/month.
Should I use Pinecone or Weaviate for B2B SaaS RAG?
Weaviate. Its multi-tenancy model handles thousands of customer namespaces in a single cluster cleanly, with per-tenant ACLs and schema flexibility. Pinecone's per-namespace overhead doesn't scale to multi-thousand tenants economically.
Can Chroma handle production traffic?
Below 5M vectors and 100 QPS, yes. Above that scale, p99 latency degrades and sustained throughput plateaus. The Chroma 1.0 sharded mode helps but Qdrant or pgvector are better long-term picks at production scale.
Does Milvus require Kubernetes?
For distributed mode, yes. Standalone mode (single-binary) is available for development. Zilliz Cloud removes the operational complexity entirely. For production deployments below 50M vectors, the Kubernetes overhead is usually not worth it — Qdrant single-binary is operationally simpler.
What's the cheapest path to RAG production in 2026?
pgvector on an existing Postgres + OpenAI text-embedding-3-small at $0.02 per 1M tokens. For a 1M-document corpus that's $50-80 in embeddings + $25-50/mo Postgres = roughly $100 startup + $25-50/mo ongoing. No other stack approaches this cost.
Which vector DB has the best hybrid (BM25 + dense) search?
Weaviate (0.94 nDCG in our test) edges out Qdrant (0.93). Both meaningfully ahead of Pinecone (0.89) and pgvector + pg_search (0.86). If hybrid retrieval quality is your primary metric, Weaviate is the answer.
Conclusion
The 2026 vector database market has a clear default for each use case.pgvectorwins for cost-and-operations conscious teams running sub-50M-vector workloads — and that's most teams.Pineconewins when ops simplicity matters more than $50-100/month of cost difference.Qdrantwins when raw performance is the gating constraint.Weaviatewins for multi-tenant SaaS and hybrid-search-quality-critical apps.Milvus/Zillizwins above 100M vectors.Chromawins for prototypes and small production apps.
The most common 2026 production pattern we now see: start with pgvector (90% of teams), migrate to Pinecone or Qdrant when you cross 20M vectors or 200 QPS sustained. Skip Milvus until you have a clear path to 100M+ vectors. Use Weaviate from day one if you're building multi-tenant SaaS. Use Chroma only for prototyping.Browse all vector and hosting toolsor compare individual contenders:Pinecone,Weaviate,Qdrant,Milvus,Chroma,pgvector.