Most "Chat with Your Data" Products Will Fail

Most “Chat With Your Data” Products Will Fail | by Jatin Solanki | CodeX | May, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

CodeX

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Most “Chat With Your Data” Products Will Fail

Jatin Solanki

3 min read· May 7, 2026

Listen

The problem isn’t SQL generation. The problem is that AI has no idea which data to trust. Press enter or click to view image in full size

AI Generated — Difference with Context and w/oEveryone is building AI agents for analytics right now. “Ask questions in natural language.” “Generate SQL instantly.” “Chat with your warehouse.” The demos look magical. Until the system hits a real enterprise environment. Not a toy dataset. Not 12 clean tables in Snowflake. I mean: 10,000+ tables duplicated metrics inconsistent naming broken lineage undocumented columns conflicting business definitions legacy pipelines stale dashboards tribal knowledge hidden in Slack That is where most AI SQL systems quietly collapse. The problem is not the LLM. The problem is CONTEXT . The Industry Is Optimising the Wrong Layer Most teams are obsessing over: GPT-5 vs Claude vs Gemini fine-tuning larger context windows agent frameworks prompt engineering But none of these solve the actual enterprise problem. Because enterprise analytics is not a language problem. It is a context retrieval problem. An LLM cannot magically understand: which revenue table is trusted which dataset is deprecated which metric finance uses which pipeline failed yesterday which dashboard powers the board meeting which schema contains PII which transformation changed last week Without context, SQL generation becomes probabilistic guessing. Challenge with Scale Let’s take a simple request: “Generate the sales report and revenue trend for the last 6 weeks.”

Sounds easy. Now imagine the warehouse contains: 10,000 tables 120,000 columns 15 business domains 7 duplicated revenue models 3 semantic definitions of “customer” 40 dbt projects multiple BI tools several historical migrations The AI now faces a massive search problem . The Naive Architecture Everyone Starts With Most first-generation AI analytics systems work like this: user_prompt = "Generate sales report for last 6 weeks"

context = get_all_metadata()

llm.generate(user_prompt + context)This works beautifully in demos. Then reality arrives. If each table contributes even 200 tokens of metadata: 10,000 tables × 200 tokens = 2,000,000 input tokensCompletely impractical. Even if the model supports it: latency explodes cost becomes absurd hallucinations increase accuracy drops retrieval quality deteriorates Large context windows are not the solution. They are a temporary patch. The Real Architecture Enterprises Need Modern enterprise AI systems need a retrieval-first architecture. Not a bigger prompt. The future stack looks more like this: User Query Semantic Understanding Metadata Retrieval Lineage Context Expansion Trust Scoring Relevant Dataset Selection SQL Generation Validation + ExecutionThe LLM should never see all 10,000 tables. It should only see: the right 10–30 tables trusted metrics business definitions lineage relationships governance signals observability signals That changes everything. Why Metadata Alone Is Not Enough This is where many catalog vendors also struggle. Metadata alone does not create intelligence. You need connected context. There is a massive difference between: “Here are all the tables” AND “Here are the trusted datasets finance uses for revenue reporting with active downstream dashboards and no freshness incidents.” That second layer requires: lineage usage patterns quality scoring ownership business glossary incident history semantic relationships domain modeling This is why the next generation of platforms are becoming context engines rather than static catalogs. Future of Data Stack The LLM is the final reasoning engine. Not the primary search engine.

Press enter or click to view image in full size

Context Layer is the future

This Is Why Long Context Windows Alone Won’t Solve Enterprise AI Even if models support: 1M tokens 10M tokens someday you still do NOT want to dump entire enterprise metadata. Because: attention quality degrades irrelevant context pollutes reasoning hallucination probability increases response time grows massively More context is not always better context. Relevant context wins.

The Real Bottleneck in Enterprise AI The future bottleneck is not: “Can the LLM understand SQL?”

It already can. The bottleneck is: “Can your platform retrieve the correct enterprise context with high trust and low latency?”

That is exactly why the “Data Context Layer” category is emerging so aggressively right now.

Data Context

Sql

Data Engineering

Python

Published in CodeX 35K followers ·Last published 21 hours ago

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Written...

Most "Chat with Your Data" Products Will Fail

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy