You can't rely on LLMs to understand the grain of your data

Bioinformatics Zen | You can’t rely on LLMs to understand the grain of your data

Bioinformatics Zen

A blog about bioinformatics and mindfulness by Michael Barton.

When I generate an analysis notebook figure or table with an LLM I just want to move forward with the result. I don’t want to spend time grading the LLM’s homework. There’s no reward for checking the analysis of an LLM — emotionally or from my colleagues. Analysis graphs and figures are what I get paid to generate, and the expectation with LLMs is that I generate them faster. Even before LLMs became much better around November 2025, I knew that faster, more responsive analysis gets rewarded over slower, more thoughtful analysis. That’s been true of the industry as long as I’ve been in it. The speed at which LLMs can generate results, and the accompanying expectations, have just compounded this.

I’ve noticed that the pre-LLM friction of having to write analysis myself was a form of protection. Having to type everything caused me to subconsciously think about what I was doing, without conscious effort. It was a process running in the background. Now with LLMs that background thinking is gone. I have to do it consciously, and I’m being nudged by the LLM not to. Even before LLMs became good enough, I usually felt that I should spend more time thinking ahead.

Here’s a concrete example using publicly available real e-commerce data (Olist, 100K orders from a Brazilian e-commerce startup). I gave an LLM this data and asked:

Show an example SQL query that would total revenue by product category and payment type

Claude chat showing a prompt asking for total revenue by product category and payment type, with two SQL query attempts and notes about fan-out and double-counting risks

This is the kind of question an analyst might ask. Category comes from the items side of an order (order_items joined to products). Payment type lives in the payments table. Both are keyed to order_id. The natural SQL joins them directly. The LLM identified a potential fan-out issue: where the join would incorrectly create additional rows at the wrong grain. The grain of the data, what each row represents, matters here. The orders table has one row per order. The payments table has one row per payment charge — an order paid with a credit card and two vouchers has three rows. Join that to a multi-item order naively and the payment values multiply, inflating the revenue. The LLM caught this, but its fix introduced a subtler problem. The cognitive danger is that the LLM identified a grain issue, and therefore my unconscious bias is that there are no other issues in the generated analysis. Especially when the LLM sells it in very positive language. However, if I checked this analysis myself I would find a second subtle bug that affects 2.3% of orders. Not impactful perhaps, but incorrect — and presented confidently.

I don’t have any easy answers for this, but I don’t think it can be solved by better prompting.

Unless the analysis ask is very simple, I always start with planning mode. Planning mode adds a little helpful friction back in: reading the plan. While I’m reading, I’m thinking and not doing. Anyone who’s ever reviewed someone else’s code knows that the perspective is different. Using planning mode is a good trigger for this shift. Specifically, when I use planning mode I have a Claude exit hook, which asks the LLM questions like:

What are the exit criteria, what does success look like?

What are the invariants and grain of the data?

What are the potential failure modes?

All of these questions are about fighting bias, mine and the LLM’s. Kahneman might call this System 1 versus System 2. The LLM wants to think fast. I’m trying to think slow. Including these questions almost always shows a gap in the LLM’s thinking — to the extent it makes me worried about code I generated before adopting this habit. These questions often surface assumptions that I didn’t know I, or the LLM, was making.

Going back to the beginning of this article — working with LLMs has become about managing my mental effort as a limited resource. The harder it is to check the work of the LLM, the less likely I am to do it. And the faster the LLM generates, the harder it gets.

Appendix

You don’t need to read this appendix to understand the point I’m making in this article. This section is here if you want to get into the details about how subtle bugs can show up due to data grain misunderstanding. Exactly the type of issue that the current generation of frontier LLMs still seem to struggle with. I didn’t get a chance to try with Fable, but perhaps someone could try using the original prompt above and see how it performs.

Table 1: Three tables in the Olist data at three different grains.

Table Rows Unique Orders Grain

orders 99441 99441 One row per order

order_items 112650 98666 One row per item in an order

payments 103886 99440 One row per payment charge

Joining order_items to payments on...

You can't rely on LLMs to understand the grain of your data

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI