Most “Chat With Your Data” Products Will Fail | by Jatin Solanki | CodeX | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
CodeX
Everything connected with Tech & Code. Follow to join our 1M+ monthly readers
Most “Chat With Your Data” Products Will Fail
Jatin Solanki
3 min read·<br>May 7, 2026
Listen
Share
The problem isn’t SQL generation. The problem is that AI has no idea which data to trust.<br>Press enter or click to view image in full size
AI Generated — Difference with Context and w/oEveryone is building AI agents for analytics right now.<br>“Ask questions in natural language.”<br>“Generate SQL instantly.”<br>“Chat with your warehouse.”<br>The demos look magical.<br>Until the system hits a real enterprise environment.<br>Not a toy dataset.<br>Not 12 clean tables in Snowflake.<br>I mean:<br>10,000+ tables<br>duplicated metrics<br>inconsistent naming<br>broken lineage<br>undocumented columns<br>conflicting business definitions<br>legacy pipelines<br>stale dashboards<br>tribal knowledge hidden in Slack<br>That is where most AI SQL systems quietly collapse.<br>The problem is not the LLM.<br>The problem is CONTEXT .<br>The Industry Is Optimising the Wrong Layer<br>Most teams are obsessing over:<br>GPT-5 vs Claude vs Gemini<br>fine-tuning<br>larger context windows<br>agent frameworks<br>prompt engineering<br>But none of these solve the actual enterprise problem.<br>Because enterprise analytics is not a language problem.<br>It is a context retrieval problem.<br>An LLM cannot magically understand:<br>which revenue table is trusted<br>which dataset is deprecated<br>which metric finance uses<br>which pipeline failed yesterday<br>which dashboard powers the board meeting<br>which schema contains PII<br>which transformation changed last week<br>Without context, SQL generation becomes probabilistic guessing.<br>Challenge with Scale<br>Let’s take a simple request:<br>“Generate the sales report and revenue trend for the last 6 weeks.”
Sounds easy.<br>Now imagine the warehouse contains:<br>10,000 tables<br>120,000 columns<br>15 business domains<br>7 duplicated revenue models<br>3 semantic definitions of “customer”<br>40 dbt projects<br>multiple BI tools<br>several historical migrations<br>The AI now faces a massive search problem .<br>The Naive Architecture Everyone Starts With<br>Most first-generation AI analytics systems work like this:<br>user_prompt = "Generate sales report for last 6 weeks"
context = get_all_metadata()
llm.generate(user_prompt + context)This works beautifully in demos.<br>Then reality arrives.<br>If each table contributes even 200 tokens of metadata:<br>10,000 tables × 200 tokens<br>= 2,000,000 input tokensCompletely impractical.<br>Even if the model supports it:<br>latency explodes<br>cost becomes absurd<br>hallucinations increase<br>accuracy drops<br>retrieval quality deteriorates<br>Large context windows are not the solution.<br>They are a temporary patch.<br>The Real Architecture Enterprises Need<br>Modern enterprise AI systems need a retrieval-first architecture.<br>Not a bigger prompt.<br>The future stack looks more like this:<br>User Query<br>Semantic Understanding<br>Metadata Retrieval<br>Lineage Context Expansion<br>Trust Scoring<br>Relevant Dataset Selection<br>SQL Generation<br>Validation + ExecutionThe LLM should never see all 10,000 tables.<br>It should only see:<br>the right 10–30 tables<br>trusted metrics<br>business definitions<br>lineage relationships<br>governance signals<br>observability signals<br>That changes everything.<br>Why Metadata Alone Is Not Enough<br>This is where many catalog vendors also struggle.<br>Metadata alone does not create intelligence.<br>You need connected context.<br>There is a massive difference between:<br>“Here are all the tables”<br>AND<br>“Here are the trusted datasets finance uses for revenue reporting with active downstream dashboards and no freshness incidents.”<br>That second layer requires:<br>lineage<br>usage patterns<br>quality scoring<br>ownership<br>business glossary<br>incident history<br>semantic relationships<br>domain modeling<br>This is why the next generation of platforms are becoming context engines rather than static catalogs.<br>Future of Data Stack<br>The LLM is the final reasoning engine.<br>Not the primary search engine.
Press enter or click to view image in full size
Context Layer is the future
This Is Why Long Context Windows Alone Won’t Solve Enterprise AI<br>Even if models support:<br>1M tokens<br>10M tokens someday<br>you still do NOT want to dump entire enterprise metadata.<br>Because:<br>attention quality degrades<br>irrelevant context pollutes reasoning<br>hallucination probability increases<br>response time grows massively<br>More context is not always better context.<br>Relevant context wins.
The Real Bottleneck in Enterprise AI<br>The future bottleneck is not:<br>“Can the LLM understand SQL?”
It already can.<br>The bottleneck is:<br>“Can your platform retrieve the correct enterprise context with high trust and low latency?”
That is exactly why the “Data Context Layer” category is emerging so aggressively right now.
Data Context
AI
Sql
Data Engineering
Python
Published in CodeX<br>35K followers<br>·Last published 21 hours ago
Everything connected with Tech & Code. Follow to join our 1M+ monthly readers
Written...