Munich 1991: the Roots of the Current AI Boom
David Ha, Sakana AI
Jürgen Schmidhuber, KAUST & IDSIA<br>18 June 2026<br>@hardmaru
@SchmidhuberAI<br>AI Blog
Munich 1991: the Roots of the Current AI Boom
Preface by David Ha
When we look at the massive scale of today’s Artificial Intelligence boom, it is easy to forget that the foundations of this trillion-dollar industry were laid down over 30 years ago in Munich.
Today, the world's top tech companies are investing hundreds of billions into scaling up Large Language Models (LLMs) such as ChatGPT. Yet, outside of a few history buffs or old-school folks in the Machine Learning community, people might not realize that virtually every core building block of these modern systems was published in a span of just a few months back in 1991. Incredibly, they all emerged from a single lab at the Technical University Munich led by Jürgen Schmidhuber.
Before that year ended, his team had essentially mapped out the modern era of deep learning. They published the very first Transformer variant (see ChatGPT's "T"), introduced the concept of unsupervised pre-training (ChatGPT's "P"), and pioneered neural network distillation. They also introduced deep residual learning, the centerpiece of both LSTMs and ResNets, the most cited AI papers of the 20th and the 21st century, respectively. These four techniques power today's most advanced LLMs.<br>Furthermore, they laid the early groundwork for generative adversarial networks, foundational for "Generative AI."
Jürgen’s contributions have deeply shaped my own thinking over the years, from my time at Google Brain to our recursive self-improvement (RSI) research we're currently pushing at Sakana AI. I am especially proud to have helped popularize World Models back in 2018, building directly on<br>concepts<br>his lab introduced in the 1990s.
It is amazing to see how well some of these ideas have stood the test of time, scaling up to be fully embraced by the global AI community! For those interested in the real history of deep learning, Jürgen has put together a detailed timeline below of exactly how these seeds were planted in Munich in 1991.
David Ha, June 2026
Jürgen Schmidhuber's 1991 Timeline, with Annotated References
I am proud of the work my team did in 1991 in my home city when compute was millions of times more expensive than today [RAW], and of all the great people I worked with there and afterwards. Check out TU Munich's following key AI publications dated 3/26/1991—8/31/1991.
★ 26 March 1991: the first kind of<br>Transformer (see the T in ChatGPT)—now called the unnormalized linear Transformer [ULTRA][FWP0-6][WHO10][DLH]: the predecessor of the normalized quadratic Transformer [TR1].<br>ULTRA is still important, also because of its efficiency: its computational costs scale linearly in input size, rather than quadratically.
★ 30 April 1991: Pre-Training for deep neural networks (NNs)—the P in ChatGPT [UN0][UN1][UN2][UN][DLH]. This enabled<br>very deep learning [WHO5].
★ 30 April 1991: Neural network distillation—central to the famous 2025 DeepSeek "Sputnik" and other Large Language Models (LLMs) [UN0][UN1][UN2][WHO9][DLH].
★ 15 June 1991: deep residual learning with residual connections for very deep NNs [WHO11] (see Sepp Hochreiter's diploma thesis [VAN1]): the core ingredient of Long Short-Term Memory [LSTM1], the most cited AI of the 20th century, basis of the first LLMs in the 2010s (ELMO, ULMFiT).
The most-cited scientific article of the 21st century [MOST25-26] is also about deep residual learning, focusing on a variant of our LSTM-inspired deep residual Highway Net [HW1-25b] that was<br>10 times deeper than previous feedforward NNs [WHO11][DLH]. Deep residual learning is now being used in virtually all LLMs.
★ 31 August 1991: first peer-reviewed publication [GAN91] on generative & adversarial networks [GAN90-25] for<br>neural world models [WM26,WM26b]<br>trained through artificial curiosity & creativity—now controversially used for deepfakes and other applications of Generative AI [WHO8][DLH].
As of January 2026, the two most frequently cited papers of all time (with the most citations within 3 years—manuals excluded) are directly based on the work of 1991 [MOST26][MOST][MIR].<br>In 1991, however, it was already totally obvious that LLM-like NNs alone are not enough to achieve Artificial General Intelligence (AGI). No AGI without mastery of the real world [DLH]! That's why we started working on additional techniques required to achieve AGI, e.g.,<br>planning with adaptive world models [PLAN1-6][WM26,WM26b] created by<br>artificial scientists [AC] (since 1990 at TU Munich),<br>meta learning & recursive self-improvement (since 1987) [META1][META],<br>and others [DLH][AIB].
Around the same time , Munich also was the origin of the<br>first self-driving cars in traffic [AUT] (by Ernst Dickmanns's team), going up to 175 km/h. The city was truly the epicenter of AI. In the past 3 decades, however, most of commercial AI has shifted to...