Shannon Got AI This Far. Kolmogorov Shows Where It Stops

dnw1 pts0 comments

Shannon Got AI This Far. Kolmogorov Shows Where It Stops. | by Vishal Misra | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

Shannon Got AI This Far. Kolmogorov Shows Where It Stops.

Vishal Misra

14 min read·<br>Mar 7, 2026

30

Listen

Share

This post previewed a conversation I recorded with Martin Casado for the a16z podcast. The ideas here came up in that discussion — consider this the written version.

Press enter or click to view image in full size

A map that knows everything it has seen — and nothing beyond its edge. The equation above it was not found by fitting curves. It was found by asking what kind of universe would make the anomalies disappear.Here is a question that sounds like a trick but isn’t: is the number pi simple or complex?<br>Your intuition probably says complex. The digits go on forever without repeating. No pattern jumps out. If someone read you the digits one by one, you could not predict the next one any better than chance. By this measure — the measure of statistical surprise, of how much each new digit tells you — pi is maximally complex. Incompressible. Irreducible.<br>But here is another answer: pi is trivially simple. The entire infinite sequence is generated by a program you could write in four lines.<br>4 * (1 - 1/3 + 1/5 - 1/7 + 1/9 - ...)That is it. Every digit, forever, from a handful of symbols. By this measure — the measure of the shortest program that generates the sequence — pi has tiny complexity. It is one of the simplest objects in mathematics.<br>These two measures have names. The first is Shannon entropy, after Claude (yes, Anthropic’s flagship model is named after him) Shannon’s foundational work on information theory. The second is Kolmogorov complexity, after Andrei Kolmogorov’s work on algorithmic information theory. For most sequences you encounter in everyday life, they agree reasonably well. For pi, they disagree completely. And that disagreement, it turns out, is the key to understanding what (current) artificial intelligence can and cannot do.<br>Press enter or click to view image in full size

Part 2 of my podcast with Martin Casado

The measure that deep learning optimizes<br>Modern deep learning — the technology behind Claude, Gemini, GPT, and every other large language model — is trained by minimizing cross-entropy loss. Cross-entropy is a direct descendant of Shannon entropy. When a model trains on text, it is learning to assign high probability to sequences that actually appear in the training data and low probability to sequences that don’t. It is, at its core, a very sophisticated machine for learning statistical regularities.<br>This is not a criticism. Shannon entropy is the right measure for compression, communication, and prediction. And deep learning does these things extraordinarily well. The statistical structure of human language is staggeringly rich, and modern models capture it with a precision that continues to astonish even their creators.<br>But Shannon entropy and Kolmogorov complexity are measuring fundamentally different things. Shannon entropy measures the statistics of outputs. Kolmogorov complexity measures the structure of the generating process — the program, the mechanism, the cause.<br>Deep learning learns the outputs. It does not learn the program.<br>This distinction sounds abstract. The story of Albert Einstein and a failed experiment from 1887 makes it concrete.

The experiment that shouldn’t have failed<br>In 1887, Albert Michelson and Edward Morley set out to measure the speed of the Earth through the luminiferous aether — the invisible medium that physicists of the day believed light must travel through. The logic was simple: if light travels through a medium, and Earth is moving through that medium, then light traveling in the direction of Earth’s motion should appear slightly slower than light traveling perpendicular to it. Exactly like a swimmer going against a current versus crossing it.<br>Michelson had built an interferometer of extraordinary precision. The experiment should have worked. The signal they expected was small — about one part in ten thousand — but well within the instrument’s sensitivity.<br>They found nothing. The speed of light was the same in every direction, regardless of how Earth was oriented relative to its orbit. They repeated the experiment at different times of year. Still nothing.<br>This was not a minor discrepancy to be polished away. It was a complete null result where a clear positive signal was predicted. The aether — the cornerstone of the prevailing theory of light propagation — appeared not to exist.<br>The physics community’s response was instructive. Hendrik Lorentz and George FitzGerald proposed that objects physically contract in the direction of motion through the aether, by precisely the right amount to cancel the expected signal. This kept the framework intact. The equations still worked. The aether was preserved, merely rendered conveniently undetectable. It was, in...

shannon kolmogorov measure entropy learning light

Related Articles