LLM, meet ML pipeline. ML pipeline, meet your new build step

LLM, meet ML pipeline. ML pipeline, meet your new build step.

Let's talk

There is a very specific kind of disappointment that only machine learning can produce. It is the moment when you have finally cleaned the dataset enough that it no longer looks like it was assembled by raccoons fighting in a spreadsheet, you trained a model that is not embarrassing, the metrics are decent enough to stop people from asking whether this was all a waste of money, and then you realise that the thing is still not particularly good. Not bad. Just aggressively, stubbornly average. The kind of average that makes you stare at feature importances as if they might, out of pity, confess what they are missing.

We had one of those pipelines not too long ago. Nothing exotic, no shiny research project, just a normal piece of applied machine learning. Some structured fields, some text, a few values that were technically optional but in practice missing exactly when they would have been useful, and a model that did more or less what we asked of it, provided we did not ask for too much. In other words, a very normal ML system, which means it was held together by statistics, glue code, and the collective hope that nobody would change the input format again.

Then, as these things tend to happen these days, somebody suggested putting an LLM in front of it.

Not instead of it, mind you. Just in front of it. As a helper. As a civilized little preprocessing assistant that would take the messy bits, especially the text, and turn them into something our existing model could digest without immediately developing opinions about random whitespace and half-finished sentences. And on paper, that sounded almost suspiciously reasonable, which is always the moment when I become nervous, because in software anything that sounds too reasonable is usually hiding three months of nonsense behind a friendly diagram.

The first experiments were, annoyingly, quite good. We had text fields that people had been trying to tame with rules, regexes, and a level of optimism that should probably be regulated, and the LLM just looked at them and extracted structure in a way that felt almost rude. Intent categories suddenly made sense, descriptions that had previously required a small archaeological expedition could be normalised into something consistent enough to use, and all the ugly language that humans produce when they are in a hurry, annoyed, or both, stopped looking like noise and started looking like usable signal. That is the part people like to show in slide decks, because it is real and it is impressive: you can take a system that was struggling with messy inputs and, with relatively little effort, give it better features than the team would have produced by hand in the same time. Traditional ML absolutely loves that, because traditional ML does not care where your features came from as long as they are useful and do not fall apart the moment somebody writes “cant log in” instead of “cannot login”.

But of course that is not where the story ends, because if it ended there we would all be living in LinkedIn posts by now, and I refuse to believe reality is that cruel.

The first thing the LLM changed was not the model quality. It changed our speed. That turned out to matter much more. Before, every new idea about feature extraction meant writing code, fitting it into a pipeline, waiting for a run, finding out that the idea was mediocre, and then pretending that this had at least been a valuable learning experience. With an LLM in the loop, a lot of that turned into experimentation at conversational speed. We could ask for candidate labels, summarized fields, normalized categories, rough confidence buckets, little bits of structure that would previously have been too annoying to build “just to try it”. That shifts the economics of iteration completely. You stop protecting every hypothesis as if it were a family heirloom and start trying ten things because trying them is suddenly cheap. It is a very real advantage, and it is probably the biggest one. Not because the LLM is magical, but because many ML teams spend an absurd amount of time wrestling raw material into a shape that a model can tolerate. If you reduce that friction, the whole system gets faster. It is worth noticing where that friction actually lived, though. None of those gains came from putting the LLM inside the model. They came from putting it upstream of the model, in the slow, messy, design-time part of the work, the part that happens long before any production system has woken up and started caring.

Unfortunately, faster is not the same thing as better, and this is where the AI marketing brochures usually develop a mysterious cough and change the subject.

Because the other thing the LLM gave us was a whole new class of failure modes. Before, if a preprocessing step was wrong, it was often gloriously wrong. A parser crashed. A rule misfired. A field was empty when it should not have...

LLM, meet ML pipeline. ML pipeline, meet your new build step

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine