Real Cost of Vector Workloads

namarjun1 pts0 comments

Vector Storage Costs: S3, OpenSearch, pgvector, Pinecone

At re:Invent 2025, AWS announced that Amazon S3 Vectors was GA with a 40x capacity bump (2 billion vectors per index). The headline that was mentioned widely was: "up to 90% lower cost than specialized vector databases." This is still true, but as with almost any technology, the details vary widely based on your situation and workload.

I have been working on this article for a while, but on May 28, 2026, AWS quietly knocked out one of the comparison's load-bearing arguments. The next-generation Amazon OpenSearch Serverless went GA with scale-to-zero compute and no minimum OCU floor. I'm a big fan of serverless services and seeing something that fits that definition much more than before is great news. The "$700/month idle" criticism that disqualified OpenSearch from every dev, demo, and bursty workload last year is mostly gone now. So I modified what I was working on here to compare S3 Vectors against the NEW OpenSearch Serverless (now called "NextGen"), with the old architecture (now "Classic") available as a historical reference column in the calculator.

The pricing model is still one of the most workload-sensitive pricing models AWS has shipped in years. S3 Vectors charges almost nothing when you don't query, then ramps hard with both query count and index size. NextGen OpenSearch Serverless now also scales to zero, but you pay a 10-30s cold-start (we measured ~15s at 50K vectors) on the first query after idle. Aurora pgvector falls in the middle. Pinecone Serverless has a minimum that swallows hobby workloads but flattens out at scale. None of these stores is the cheapest at every shape of workload, and with NextGen reshuffling the deck the crossovers between them are surprising.

This post walks through a hands-on cost model and a live benchmark harness across all four stores. Everything in here runs from one Terraform apply and a make load && make bench, on Python 3.14, against us-east-1. The companion repo is at github.com/RDarrylR/aws-vector-hosting-comparison. You can clone it and try the working four-way comparison yourself.

Why the blog comparisons don't always tell the full story with a real workload

Most "S3 Vectors vs X" articles you can read today pick a single point on the workload curve and report the winner at that point. The trouble is that the four contenders use four genuinely different pricing models, and the rank order flips three times as you move along just one axis - query rate.

StoreWhat you actually pay forAmazon S3 VectorsStorage per logical GB-month + per-GB writes + per-million query API + per-TB query data processedAmazon OpenSearch Serverless NextGen OCU-hours when warm, $0 compute when idle (10-min idle timeout), + S3-backed storage per GB-monthAmazon OpenSearch Serverless Classic (pre-May 2026)OCU-hours with 2+2 (or 1+1) OCU minimum floor + S3-backed storage per GB-monthAurora PostgreSQL Serverless v2 + pgvectorACU-hours + storage per GB-month + I/O per million (or I/O-Optimized flat rate)Pinecone Serverless (Standard)Storage per GB-month + Read Units + Write Units + $50/month minimum<br>That last column is the only thing that matters. As of May 28, 2026, three of the AWS-native options can reach a true idle-cost floor in different ways: S3 Vectors has no provisioned compute, NextGen OSS can scale compute to zero after a 10-minute idle, and Aurora Serverless v2 can auto-pause when configured with min_capacity = 0. Pinecone's different: the Starter tier can be free for small demos, but the paid Standard tier still carries a $50/month minimum regardless of usage. This demo deliberately keeps Aurora at a min_capacity = 0.5 floor so its warm queries stay sub-100ms; if you'd rather pay nothing while idle and accept the resume latency on the next query, drop the floor to 0 in infrastructure/modules/aurora-pgvector/main.tf. The "cheapest store for a vector index" question has no answer until you commit to a workload shape, and most workloads have several shapes layered on top of each other (idle development, bursty user traffic, periodic batch reindexing).

So instead of a single answer, the goal of this post is a decision framework with the math underneath it , plus a repo you can clone to plug in your own numbers.

The four contenders, current versions

We are running this on the May 2026 release of each:

Amazon S3 Vectors : GA December 2025, expanded to 17 additional regions in March 2026. Up to 2B vectors per index, 10K indexes per vector bucket. Native PUT/GET/Query/Delete Vector APIs. SSE-S3 or SSE-KMS encryption.

Amazon OpenSearch Serverless NextGen : GA May 28, 2026. VECTORSEARCH collection type, FAISS HNSW. Scale-to-zero on a 10-minute idle timeout, no minimum OCU floor when configured for it. Compute and storage decoupled via a "collection group" resource. AWS documents 10-30s first-query latency when waking from zero; we measured ~15s at 50K vectors in this demo. Once warm, client-side query latency sits...

serverless vectors query idle opensearch vector

Related Articles