Are Vector Databases Enough for Modern AI Workloads? Y/N

Why We Built Vector Lakebase: Rethinking Unstructured Data Architecture for AI - Zilliz blog

Blog Why We Built Vector Lakebase: Rethinking Unstructured Data Architecture for AI

Copy page

Why We Built Vector Lakebase: Rethinking Unstructured Data Architecture for AI May 26, 202619 min read

James Luan

Content Mobile internet already went through this cycle once Retrieval solved the first problem, not the final one From retrieval systems to continuous systems: CS/CD Why existing architectures eventually hit their limits What we mean by Vector Lakebase The cost of separating storage and compute and how we address it I/O amplification Vector Lakebase: one data foundation, multiple compute modes Resource scheduling becomes part of the Vector Lakebase External Collection: meeting data where it already lives What defines the first generation of Vector Lakebase Vector databases are not disappearing Zilliz Vector Lakebase is available in public preview Start Free, Scale Easily Try the fully-managed vector database built for your GenAI applications. Try Zilliz Cloud for Free

Recently, we launched Zilliz Vector Lakebase, the next evolution of Zilliz Cloud from a pure vector database system into a unified, lake-native data foundation for AI workloads. The announcement got a lot of interest. It also surfaced questions almost immediately about where Zilliz was headed.

Was Zilliz stepping away from vector databases? Or, put more directly: are vector databases already becoming obsolete?

I understand why these questions came up. For years, Zilliz has been known for building production-ready vector database systems (open-source Milvus and fully managed Zilliz Cloud). So when we started talking about evolving to a lake-native data foundation for AI, some people naturally wondered whether this meant a change in direction.

The short answer is NO. Absolutely NOT. If anything, Vector Lakebase is our answer to what happens after vector databases succeed.

Over the past several years, vector databases have become one of the foundational infrastructure layers of the AI stack. Adoption has grown faster than we could have imagined when we started Milvus nearly a decade ago. The category is real, and the need for semantic retrieval is only becoming more important.

But something else has become clear to us as well: vector retrieval is no longer the whole problem.

As AI systems move from static assistants into continuously running agents, enterprises are asking for something broader from their unstructured data infrastructure. They do not just want a system that can retrieve information. They want a system that can improve the data, reorganize it, analyze it, refine it, and feed those improvements back into production. That changes the architecture.

That shift reminds me of an earlier cycle in infrastructure history: the evolution of databases during the mobile internet era. The details are different, but the pattern is familiar. A new kind of application creates a new kind of data pressure. The first generation of infrastructure solves the immediate serving problem. Then, as the data grows, the architecture has to expand.

I think vector databases are entering that next stage now.

Mobile internet already went through this cycle once

Around 2010, as mobile applications exploded, MongoDB became one of the defining infrastructure products of that period.

The reason was straightforward. Mobile applications generated massive amounts of semi-structured data: user events, social activity, device telemetry, behavioral signals, product logs. None of these fit neatly into the relational database patterns most teams were using at the time. Product teams were shipping fast, schemas were changing constantly, and the first problem was simply to accept the data without slowing the application down. MongoDB solved that immediate problem very well: ingest the data first. Structure and analysis could come later.

Several years later, the industry started asking a different question. Once all this data existed, how could businesses actually use it? That shift helped drive the rise of modern data warehouses such as Snowflake and Redshift. The focus moved from operational storage to analytical insight. Companies wanted BI reports, user cohorts, attribution, forecasting, and growth analysis. Data stopped being only an operational byproduct and became a business asset.

Then another bottleneck emerged.

The divide between transactional systems and analytical systems became increasingly painful. Data pipelines between OLTP and OLAP environments were fragile, expensive, and operationally exhausting. The same datasets were copied repeatedly across systems, often with synchronization delays and subtle inconsistencies.

That was the environment that gave rise to the Lakehouse architecture . Databricks, Iceberg, Hudi, and related systems all converged around the same basic idea: a single logical copy of data should support multiple computation models...

Are Vector Databases Enough for Modern AI Workloads? Y/N

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine