LLMs give us a way to factorize intelligence

Intelligence | Ankit Maloo

Some problems don’t get solved in your head, they get solved in someone else’s while you watch or read their solution. You might feel it before you understand it. A small calibration. A quiet “Oh!”

You cannot always tell what got them there. Maybe they extracted their experience into understanding more efficiently than you did, or they remembered the solution from last Tuesday. Both explain the outcome. Often both are partially true. You cannot test them apart in any clean way. The best researchers could do was design problems unlikely to have been seen before, measure the outcome, and argue about what the score meant.

This is where we have always been with intelligence. A black box. With no factorization. We watched what came out - solutions, inventions, occasional flashes of genius - and attributed it to some combination of memory, recall, reasoning, and an unexplained thing called intelligence. The whole bundle activated at once. We drew a line in the sand where our understanding gave out, and called the far side “intelligence”. Every IQ test, every SAT, every century of psychometric argument is a long monument to this. A whole discipline built around inferring something from the only data the brain would ever release: the outputs.

The bundle came with its own built-in obstacles. Evolutionary priors, episodic memory, learned procedures, linguistic competence, social intuition, embodied skill, all running on the same substrate, inside the same organism, activated together every time the system did anything interesting. There was no way to pick it apart. Not really. Well, until now.

Competence

Consider what you actually test when you are evaluating someone on a task. In the first instance you are looking at competence - the ability to do the given task. Whether they can write the proof, diagnose the patient, ship the code and so on. Competence is what is observable. Companies and enterprises pay for it. Credentials exist to certify this.

To get to intelligence, you want to go to a layer below. That is, inferring how they acquired the competence. Did they write the proof because they memorized it previously? Did they genuinely derive it on the spot from general principles? Pattern matched it instinctively? Was it just innate? Learned from a similar problem last year? This is the layer where you are probing for the harder questions, and the layer you cannot directly observe. So, people designed tests which would eliminate boring explanations of competence. Completely new and unseen problems, novel patterns, constrained contexts, to find out what intelligence looks like. The entire scientific literature on intelligence testing is a long effort to get a clean read on layer two by controlling for layer one.

LLMs are competent too

Large language models are stunningly competent. This often gets lost in the noise around whether they are really intelligent, but on layer one the answer is not in serious dispute. They write, summarize, translate, code, reason about legal documents, score well on expert exams, and in more domains every month they match or exceed skilled humans at the task. Competence is not in doubt.<br>The question is what is producing the competence. And here the interesting part is not that we have answered the question. But rather, we know what goes in their training data. We know how they get to the competence they have.

LLMs are the first system in history that lets us ask the right questions at a deeper layer.

I will come back to LLMs in a minute, but first let me explain what the bundle factorization could look like.

Engine and the Substrate

The cleanest vocabulary for the separation comes from François Chollet. He has been arguing for years that intelligence is not the same as skill. Skill is the ability to perform a task well. Intelligence is the ability to acquire new skills efficiently from minimal experience. This framing draws a line between engine and substrate: the engine is whatever converts experience into generalizable capability, and the substrate is everything else — data, tools, accumulated knowledge, search, the composability of stored skills. A high-conversion engine extracts more structure from less input. A low-conversion engine needs more substrate to reach the same place, and often doesn’t reach it at all. The conversion ratio is the key. Everything else is fuel.

Stay with the engine-and-substrate frame for a moment, because it does more work than it first looks like.

Human brains have been anatomically stable for about ten thousand years. Civilization has obviously progressed. It has accelerated, and the acceleration is accelerating too. Most of the raw increase in human capability has happened in the last six hundred years, which is also the period after the printing press made stored knowledge cheap and composable at scale1. Arguably, the engine stayed the same and the substrate changed. Writing added substrate. Mathematics added...

LLMs give us a way to factorize intelligence

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down