, by Roger Johansson 2008-12-07 ().">
Human-like Neural Nets by Catapulting · Gwern.net
Skip to main content
Warning: JavaScript Disabled!
For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.
adversarial examples, grokking (NN), savantism
Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence.
2024-04-21–2026-06-05<br>finished<br>certainty: unlikely<br>importance: 10
Intelligence, Broadly
Anomalies
Artificial
Sample Inefficiency
Sample Efficiency
Smallness
Superhuman Prediction
Persistent Adversarial Examples
Biological
Human Amnesia
Human Ignorance
Human Intelligence
Need to Sleep
Stage-Wise Development
Slow Development
Human Slowness
Qualitative Differences in Training Dynamics
Sleep
Savantism
Grokking
Cyclical Learning Rates
Adversarial Examples & Isoperimetry
How the Brain Works
Training a Catapulted LLM
Prototyping With Arithmetic
MLPs
Hardware
Prototyping With Image Classification
Adversarial Robustness
Prior Art
Benchmarking
Capabilities
Economic Implications
Alignment
Interpretability
Appendix
Dynamic Grokking
Pondering ≠ Tree Search
Neuroplasticity = Dynamic Evaluation
Repeated Neuroplasticity As Implicit Search
There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are artificial neural nets smart in such stupid ways, and biological brains stupid but in smart ways?
I propose a major change in deep learning scaling paradigms: the architectural differences between human brains and NNs (particularly LLMs) may be due to a bias-variance tradeoff, where LLMs minimize variance and human brains minimize bias. Human brains do this by deep double descent-style overparameterization, and adopting a scaling strategy of extremely high-learning-rate training of extremely overparameterized models on small diverse highly-filtered datasets. This approach would lead to sample-efficiently and compute-efficiently traveling (or catapulting ) to a highly-generalizing human-like basin in the model loss landscape, while performing poorly up until the end and failing to memorize much data.
If true, this would explain a number of odd stylized facts about how humans/NNs perform well/poorly.
Such a ‘catapulted LLM’ would generalize much better than existing NNs, be immune to adversarial attacks, have better economics and be more resistant to cloning, could potentially enable extremely efficient MLP architectures, and by giving true generalization, provide a sturdy foundation for AI safety in the form of useful NNs which are aligned & safe for the right reasons.
This could be feasibly tested by training multi-trillion-parameter models for relatively few steps at high cyclical learning rate schedules, and benchmarking adversarial and hard examples on tasks like arithmetic and small-image classification.
Because deep learning has continued to scale up and smash through benchmarks and begun to look like it really will be the final AI paradigm, and thus in some sense the same thing as human ‘intelligence’, to a considerable degree, we can regard ‘intelligence’ as solved: intelligence is sufficient compute applied to search over programs (like Turing machines or circuits) to predict or optimize where the optimal solution is a relatively long program.
(This is a companion piece to “Guardian Angels: LLM Personalization for Productivity and Security”.)
Intelligence, Broadly
A scaling-centric view might be summed up like this:
The Master Synthesis
Anomalies
But this paradigm, as broadly correct as it now seems to be, doesn’t explain everything. We still have many specific problems that this paradigm is too general to explain.
While current NNs, and LLMs in particular, are by far the most human-like AI software ever created, in having human-like strengths and weaknesses, there are a number of anomalies in machine & biological intelligence that have no good answers.
We have many puzzles here, but they all feel connected, somehow.
Artificial
Sample Inefficiency
Why do NNs require Chinchilla-style scaling of data and compute , when humans appear to learn from multiple orders of magnitude less data, and it is increasingly plausible (given various estimates of human-brain equivalents) that they learn from less total compute? Why, as so many connectionist pioneers like Alan Turing expected, do we not train AI like children, with a curriculum and clear developmental stages?
There are many answers offered, none satisfactory. (And what should we make of theoretical results like Rosenfeld 2021’s “Nyquist...