Why LLMs Will Not Have Your Next Big Idea

Why LLMs Will Not Have Your Next Big Idea | Albert Sikkema - Building Production AI Systems

Photo by Phill Brown on Unsplash

I have been using LLMs daily for years now, mostly to code, but also for research, writing, and analysis. They are genuinely useful, sometimes spectacularly so. I have built tools, workflows, and entire coding pipelines around them. So this is not a “LLMs are bad” post. This is about something I have been thinking about for a while, and an enthusiastic youtube video about the LLM Wiki pattern got me thinking again about the drawbacks (besides the obvious benefits) and why I think that way of gathering and using data is not good enough for me.

In this post I will take you through two ideas: the first one is ‘LLMs can not create truly new ideas’ and the second is that ‘the simplification of information in an early stage leads to suboptimal results’. And the end conclusion is that, although we can do magnificent things with LLMs, this technique is not able to solve any real problems and will always result in mediocrity.

The Argument From Mechanism

Next-token prediction works by sampling from a distribution fitted over the training corpus. There is no reasoning faculty that could exceed what the text encodes. The model stores a very high-dimensional interpolation function over token sequences. It produces outputs for inputs not exactly in the training set (that is what makes it useful), but it interpolates within the convex hull of expressed human thought. It does not extrapolate beyond the manifold it was fitted to, and the model does not learn.

What I here define as a good idea is a genuine breakthrough that is good because it violates the regularities the corpus taught. That is structurally outside the region the model can reach. To the model, the revolutionary idea and the incoherent error look the same: both are low-probability under the learned distribution. There is no mechanism to favour “low-probability because true-and-new” over “low-probability because wrong.”

The model’s only notion of “good” is “high-probability under the learned distribution,” which means “resembles what was already thought.” That is anti-correlated with genuine novelty almost by definition. The quality filter is filtering for conventionality, which explains why every LLM output feels competent but unsurprising. Ask any LLM to build you a web app and you will get React, Tailwind, and shadcn/ui. And only if you are an expert (or have a good general knowledge) you will spot the things it misses or simply fails on (your knowledge on that domain is then above medium).

The Argument From Scale

Now you could argue that there are a lot of good solutions in a LLM (including the novel ones we need to solve important issues like climate, energy and water management). And that they are simply buried in the vast knowledge of the LLM and need to be combined over different domains and then extracted. But if recombinational novelty could produce breakthroughs, the current reality would have surfaced them. We have the largest idea-discrimination experiment ever running right now, with millions of LLM instances running daily, with humans reading and judging. If LLMs would live up to their promise (the big breakthrough for humanity speech we have been hearing for a few years now): why do we not see those novel ideas showing up?

The result of those LLMs has a different character: a flood of interpolative novelty (restatements, syntheses, transfers of known methods to adjacent problems) and nothing on extrapolative novelty (fundamental breaks, genuinely new ideas). That bimodal signature is exactly what the mechanism predicts. Abundant where the manifold is dense (between known points), absent where it is not (beyond them).

If the good recombinations were reachable but merely unrecognised, millions of human discriminators sifting the output over years would have found them. They have not. The breakthroughs are not hidden in the output. They are simply not there.

Recent research backs this up. Chakrabarty et al. (2025) found that LLM-generated ideas lead to homogenous outcomes across domains, and Tian et al. (2025) showed that aligned LLMs remain trapped in a “safe attractor basin” created during RLHF, limiting exploratory novelty even with strong creativity prompts. There is a counterpoint from Jiang et al. (2024) showing LLMs can combine existing concepts in ways rated more novel than ideas from NLP researchers, but that is combinatorial novelty (recombining known things), not extrapolative novelty (ideas that break the frame).

The Pipeline Problem

This connects to something I ran into while looking at Karpathy’s LLM Wiki pattern and rohitg00’s v2 extension. A wiki is basically a collection of datapoints (documents, website, databases etc) that can be connected based on theme or subject. This leads to a great overview of data on a subject with beautiful graphs that can help you understand interconnectedness of...

Why LLMs Will Not Have Your Next Big Idea

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars