Nobody trusted our internal dashboards, so we moved them to code

On this page

We audited our skills library a few months ago and found twelve dashboards hiding in it. Not dashboards. Skills that built dashboards. Someone needed a view of some data, asked Claude to put it together, got a long HTML page out of it, and then wrapped the whole thing in a skill so others could run it again. Twelve times over, by different people, for different questions. This is what happens when people want to build with AI but there's no clear way to do it. Metabase was our chosen tool, but it had got into a state, hundreds of dashboards, no governance, and people had quietly given up on it. So they reached for Claude instead, because Claude is where they were already working and it felt like the new way to do this. But Claude isn't built for dashboards. You get a one-off HTML page. To let anyone else use it you wrap it in a skill. To keep it current you run the skill again. And to make the skill produce live numbers, you teach it to call our internal MCP tools, so every run it goes back out to HubSpot or wherever and pulls the data fresh. Which sounds fine until you have a dozen of these, each with its own query logic, each person's slightly different interpretation of the same metric, all hitting source systems directly. The artefacts people shared in between were worse again, frozen at the moment Claude generated them, out of date by the time anyone opened the link. So the twelve skills weren't really the problem. They were the symptom of people not knowing what they should be doing. It went deeper than tooling The obvious read is that this is a tooling gap. People wanted to build dashboards, the official tool was friction, so give them a better way to build and the problem goes away. We'd also come to a clear view that a dashboard-skill is the wrong tool, not just an awkward one. We think of skills as something that makes an AI agent smarter at a task, encoding reasoning or knowledge it would otherwise lack. A dashboard is the opposite kind of thing. It's reporting: a fixed question, the same query, the same answer every time. Wrapping that in a skill and having the AI re-fetch and re-render it on every run is using AI as a reporting engine, which is slower, non-deterministic, and invisible to everyone else. For us that's an anti-pattern with a name. But there was something underneath all of it, and it's the reason people had drifted off Metabase in the first place: nobody knew where the right number lived. Over half of our Metabase dashboards had zero views. The ones people did use often disagreed with each other, because there was no governance over what they queried or how a metric was calculated. So you'd open two dashboards that should agree, find they didn't, and have no way of knowing which one to believe. When that happens enough times, people stop trusting all of them. And then they route around the whole system. They ask Claude an ad hoc question. They pull a number straight out of HubSpot when it should have come from Snowflake. They build a one-off skill that becomes the thirteenth dashboard in disguise. The bottleneck was never getting an answer. It was knowing whether to believe the answer once you had it. Teams were spending more time working out if a number was right than they spent acting on it. AI didn't cause this. It poured petrol on it. This had been a problem for years, quietly. What changed is that everyone at Ably now queries data through Claude. That's mostly brilliant. It's also what tipped this from a problem we lived with into one that bites daily. When anyone can get a confident, plausible-looking answer out of the data in seconds, every inconsistency in the data layer becomes a vector for a wrong decision made with high confidence. And our data layer has a lot of inconsistencies, the kind every company that's grown for a decade ends up with. The same account is called ABLY_ACCOUNT_ID in one schema and ACCOUNT_ID in another. Our unique-devices table has one row per account per country per month, so reading it the obvious way, as if a row were the total, undercounts a busy account by more than a hundred times. And half a dozen of our tables look like they hold revenue, but only two are the ones our finance team actually stands behind. Query one of the others and you get a number that looks completely right and isn't. A human analyst carries all of that in their head. They know which source to trust when several look right. They know which tables have to be summed rather than read straight. An agent doesn't know any of it, unless you tell it. It reads the field, trusts the value, and hands you a clean answer that's confidently wrong. So the faster everyone got at pulling answers, the faster we produced confident misinformation. Speed without governance is just a quicker route to the wrong call. So we fixed the layer underneath first The first piece was building a governed way for the AI to get at our data. We call it the data warehouse genie. It's a skill,...

Nobody trusted our internal dashboards, so we moved them to code

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI