Learning Abstractions: A Conversation with Yann LeCun

Learning Abstractions: A Conversation with Yann LeCun | American Academy of Arts and Sciences

An open access publication of the American Academy of Arts & Sciences

Winter/Spring 2026

Authors

Yann LeCun and James M. Manyika

View PDF

To Dædalus issue

Author Information

Yann LeCun is Executive Chairman of Advanced Machine Intelligence (AMI) Labs and the Jacob T. Schwartz Professor of Computer Science, Data Science, Neural Science, and Electrical and Computer Engineering at New York University. He previously served as Chief AI Scientist for Meta’s Fundamental AI Research (FAIR) team. He has recently published in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, International Conference on Learning Representations 2025, and Advances in Neural Information Processing Systems. James Manyika , a Member of the American Academy since 2019, is SVP at Google Alphabet and President for Research, Labs, Technology, and Society.

James Manyika. Yann LeCun is widely considered one of the godfathers of the modern era of artificial intelligence. His pioneering research with Geoffrey Hinton and Yoshua Bengio on deep learning won them the 2018 Turing Award, considered the Nobel Prize for computer science, and in 2025, the three were awarded the Queen Elizabeth Prize for Engineering. Yann has long focused on foundational and scientific advances in AI, from biologically inspired convolutional neural networks and graph transformer networks to energy-based models for machine learning and world models. In this dialogue, we focus on foundational advances in AI, what he sees as progress to date, the limitations of current mainstream approaches, and what’s needed to further advance AI and benefit science, including the role of scientists and the importance of open science. Manyika. The last decade in artificial intelligence has been extraordinary. How would you characterize the progress? What has surprised you? Yann LeCun. I’m actually going to go back fifteen years to when Geoffrey Hinton, Yoshua Bengio, and I were trying to rekindle interest in neural networks, which we rebranded as “deep learning.” And we were struck by how fast it was adopted: within eighteen months of the first papers showing that deep learning could improve speech recognition, basically every mobile speech recognition system was using a neural network.1 This was my first exposure to how quickly technology is widely adopted when it actually brings something to the table. There was a similar phenomenon in 2013 when convolutional neural networks became the rage for image recognition. Within months they were widely deployed, and people got excited about the prospect of driving assistance and autonomous driving. Within a few years, we had automatic emergency braking systems in cars, all using convolutional neural networks. These systems have become mandatory in Europe, leading to a 40 percent reduction in frontal collisions.2 This technology was saving lives, and it made me proud. The year 2015 was more about natural language processing (NLP). Yoshua’s lab made people pay attention to attention mechanisms, which direct deep-learning models to focus their attention on the most relevant data.3 Then “Attention Is All You Need,” which introduced the transformer architecture, surprised quite a few by building large networks as stacks of associative memory modules based on so-called self-attention that compares inputs to each other.4 I had been advocating for self-supervised learning (SSL) for many years, but I did not expect that its success would be primarily in NLP—though we had hints that “shallow” SSL worked for text embedding, with Word2vec from Google and fastText at Meta.5 Then there was BERT from Google, which used transformers and attention instead of recurrent neural networks, which a lot of us suspected would not go very far.6 I wasn’t particularly interested in NLP myself, but it was certainly a very useful thing to do. What I found surprising was that a simple form of temporal prediction works so well for text. I had been working on temporal prediction for video for at least ten years before that, hoping that if you train a system to predict what’s going to happen in a video, it will understand how the world works: that the world is three dimensional and that objects move independently, for example. Some objects are inanimate and obey laws that are relatively easy to understand and make predictions about. Animated objects are much more difficult to predict. The hope and the whole idea of SSL is that by training a system through prediction, it can understand the underlying structure of what it is trained on. I was focusing on video because I believe that if we can do video, we can do everything. As it turns out, SSL from video is much harder than from text. Manyika. As you think about where we are with the development of AI’s capabilities,...

Learning Abstractions: A Conversation with Yann LeCun

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits