My Notes After Databricks Data and AI Summit 2026

redskyluan1 pts0 comments

Notes from Databricks Summit 2026: Why Data Layer Matters Again - Zilliz blog

Blog<br>A Few Notes from Databricks Data + AI Summit 2026: Why the Data Layer Matters Again

Copy page

A Few Notes from Databricks Data + AI Summit 2026: Why the Data Layer Matters Again<br>Jun 30, 202613 min read

James Luan

Content<br>Data: the part of the AI stack the market has not priced yet<br>AI agents make the data problem impossible to hide<br>Databricks is aiming at the right problem<br>The map is good. But it is not finished.<br>When the database user is an agent<br>What “AI-native” should actually mean<br>SQL is not enough as the final interface<br>The moat that still matters<br>One more thing: Zilliz Vector Lakebase is available in public preview<br>Start Free, Scale Easily<br>Try the fully-managed vector database built for your GenAI applications.<br>Try Zilliz Cloud for Free

After this year’s Databricks Data + AI Summit, I found myself thinking less about any single announcement and more about a question that has been sitting with me for a while:

When AI really moves into production, what does the data layer become?

My current answer is simple, though the implications are not: in this cycle, the data layer is the part of the AI stack that has been repriced the slowest. That is starting to change.

Data: the part of the AI stack the market has not priced yet

Algorithms have been repriced in public. Models improve quickly, and the industry can see the progress almost every week. Compute has been repriced by NVIDIA, the cloud providers, and the capital markets. Everyone understands that GPUs matter.

Data has moved more slowly. Not because it matters less. The opposite is true. Data is slow to reprice because it is hard to talk about and even harder to fix. Enterprise data is messy, scattered, duplicated, stale, and full of permissions that nobody fully understands. Business semantics do not line up cleanly across systems. The thing people call “real time” is often still a scheduled job that ran sometime last night.

That work is painful. It is also not very glamorous. But once AI moves from demos into production, the pain becomes impossible to hide.

In conversations with people building and training models, including those at OpenAI and Anthropic, the discussion often comes back to the same point. Models are converging. Compute can be bought, at least if you have enough money. The defensible layer is increasingly becoming the data: the quality of it, the freshness of it, the permissions around it, and the speed at which it can be turned into useful context.

This is not only an application-layer problem. Inside model companies, model quality still depends heavily on the data pipeline. A training run may require days of preparation before the first serious experiment begins. If an upstream field is dirty, a batch is mislabeled, or a filtering rule is wrong, days of compute and waiting can disappear before anyone notices the loss curve has drifted.

AI agents make the data problem impossible to hide

Agents expose the same problem in a more operational form.

When AI agents fail in production, the first cause is often not that the model is incapable. It is that the model is acting on the wrong context: a record it cannot access, a document that expired six hours ago, a data source that quietly changed overnight, or a retrieval path that is too expensive to use often enough. I recently saw a strong team lose nearly a week to a stale context pipeline. The agent was confidently answering yesterday’s question. The model was not dumb. The context was wrong, and the system had no clean way to prove where the error entered the loop.

That is the failure mode that matters. The next infrastructure bottleneck is not simply better reasoning. It is fresh, trusted, cheap, and auditable context at the moment a model or agent makes a decision.

That is why I think the data layer is the next part of the AI stack to be repriced.

Databricks is aiming at the right problem

I am skeptical of many products that call themselves “AI data platforms.” Too often the story arrives before the system.

Databricks is different enough that I think it deserves serious attention. Two things stood out to me at the Summit.

The first is still the engineering culture. At Databricks’ scale, it would be easy for the company to become purely sales-driven. Yet the founders are still on stage talking about execution engines, transactions, real-time analytics, and the pipes underneath the product. I respect that. You can feel when a company still has product and engineering intuition at its core. It shows up in small architectural decisions long before it shows up in a keynote.

The second is the customer base. The users I spoke with at the Summit were not talking about AI as a demo layer. They were trying to push AI into production systems, and the problems they described were much more concrete: agents need to read and write business state; real-time analytics cannot keep paying the tax of...

data databricks layer summit matters problem

Related Articles