Beyond the Semantic Layer: Building a Context Layer for the Agentic Era | Kaelio<br>Try ktx<br>The context layer for data agents.<br>Open source. Free to start.<br>Get started →<br>Back to blogMore in Context layer
Simon SpätiData Engineer & Technical Author<br>LinkedIn
June 4, 2026·Last reviewed June 4, 2026·16 min read<br>Context layerSemantic layerAI data agent<br>Beyond the Semantic Layer: Building a Context Layer for the Agentic Era<br>At a glance
A context layer puts your warehouse schema, joins, metric definitions, and business knowledge in one reviewable place so data agents query governed context instead of guessing field names. A look at how it works, and at ktx, the open-source context layer.<br>Reading time<br>16 minutes
Last reviewed<br>June 4, 2026
Topics<br>Context layerSemantic layerAI data agent
Metrics, schema, dashboard logic, and domain knowledge in one place. Your team reviews. Agents query.
Writing SQL was never the hard part. Making it accurate and trustworthy against your warehouse always was. Point an AI agent like Claude or Codex at your data stack and ask a real analytics question, and the answer is usually mediocre: the agent can scrape some context from your git repos or whatever metadata it can find, but it doesn't know your joins, your metric definitions, or the business rules that give a number its actual meaning.
So how do we make data agents reliable and accurate for database queries? Everyone talks about harnesses, evals, and context layers, but the real challenge is bringing them together with data engineering and the context you already have, such as database schemas, a semantic layer, metric definitions, plus the business knowledge that normally never reaches the agent.
That's the question this blog tackles: how agents can work with the data stack and analytics, and how a context layer fits in. We also take an inside look at ktx , a new context layer that reads from the usual sources but also the less obvious ones (Markdown, Notion, etc.), driven by agentic workers.
The idea is to pull two kinds of knowledge into one reviewable place: the hard semantics (your warehouse schema, joins, and metric definitions as YAML and SQL) and the soft semantics (the business context living in docs, wikis, and Notion that agents usually never see). Both are committed to git and reviewed like code, so a human stays in the loop while agents get a warm start instead of a cold database connection. The payoff: more accurate answers with fewer (and cheaper) queries against the warehouse.
A context layer sits between your data sources and your agent. It ingests the warehouse, BI tools, Notion, and docs into hard semantics (YAML and SQL) and soft semantics (Markdown), all reviewed in git, then serves governed, accurate SQL to the agent.
SOURCES
Warehouse<br>schema · metrics
BI tools<br>dashboards · joins
Notion + wiki<br>soft semantics
Docs + Markdown<br>notes · context
CONTEXT LAYER<br>Auto-built. Reviewed in git.
Hard semantics<br>YAML + SQL the warehouse runs
Soft semantics<br>Markdown the team reads
reviewed like code
AI agent<br>accurate SQL · governed
A context layer ingests your warehouse, BI tools, and docs into hard and soft semantics, then serves governed SQL to your agent.
Modeling with Analytics AI Agents, with a Context Layer
Every business or data analyst faces mediocre results when prompting Claude or Codex on their data stack. It might figure out some context by reading the git repos or other metadata it can find. Still, the hard part is teaching the internals and business context that are unique to each business domain and company. So, how do we make AI agents for data and analytics reliable and accurate for database queries ? Do we need to manually copy and paste Google Docs and Markdown files into the prompts, or can a context layer provide a more reliable, safe, and governed way to ask an AI assistant for analytics?
These days, it's much easier to ingest or add almost any number of new data sources with custom-built ETL data pipelines, whereas before we had to make hard decisions about what to include and what not. An AI agent, such as Claude or Codex, can merge multiple data pipelines that access the source database via CLIs and destinations via MCP, API, or CLI. But we still need an API and a process for updating source data, not just once. We need to make sure, test, and verify that the data is correct, potentially more than ever.
The challenge remains in modeling the data in a way that represents what the business is, making sure data flows fast but is also correct. But any AI assistant is only as good as the context we give it, and how easily it can read and express context, metrics, models, and knowledge. So, does the context layer solve these problems?
The context layer primarily supports the accuracy of SQL queries, continuous updates to the business context, and governance . Additionally, with a newer context layer, we can include more relevant business insights that are stored internally, usually in...