Human-Like Neural Nets by Catapulting

telotortium2 pts0 comments

, by Roger Johansson 2008-12-07 ().">

Human-like Neural Nets by Catapulting · Gwern.net

Skip to main content

Warning: JavaScript Disabled!

For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.

adversarial examples, grokking (NN), savantism

Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence.

2024-04-21–2026-06-05<br>finished<br>certainty: unlikely<br>importance: 10

Intelligence, Broadly

Anomalies

Artificial

Sample Inefficiency

Sample Efficiency

Smallness

Superhuman Prediction

Persistent Adversarial Examples

Biological

Human Amnesia

Human Ignorance

Human Intelligence

Need to Sleep

Stage-Wise Development

Slow Development

Human Slowness

Qualitative Differences in Training Dynamics

Sleep

Savantism

Grokking

Cyclical Learning Rates

Adversarial Examples & Isoperimetry

How the Brain Works

Training a Catapulted LLM

Prototyping With Arithmetic

MLPs

Hardware

Prototyping With Image Classification

Adversarial Robustness

Prior Art

Benchmarking

Capabilities

Economic Implications

Alignment

Interpretability

Appendix

Dynamic Grokking

Pondering ≠ Tree Search

Neuroplasticity = Dynamic Evaluation

Repeated Neuroplasticity As Implicit Search

There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are artificial neural nets smart in such stupid ways, and biological brains stupid but in smart ways?

I propose a major change in deep learning scaling paradigms: the architectural differences between human brains and NNs (particularly LLMs) may be due to a bias-​variance tradeoff, where LLMs minimize variance and human brains minimize bias. Human brains do this by deep double descent-style overparameterization, and adopting a scaling strategy of extremely high-learning-rate training of extremely overparameterized models on small diverse highly-filtered datasets. This approach would lead to sample-efficiently and compute-efficiently traveling (or catapulting ) to a highly-generalizing human-like basin in the model loss landscape, while performing poorly up until the end and failing to memorize much data.

If true, this would explain a number of odd stylized facts about how humans/NNs perform well/poorly.

Such a ‘catapulted LLM’ would generalize much better than existing NNs, be immune to adversarial attacks, have better economics and be more resistant to cloning, could potentially enable extremely efficient MLP architectures, and by giving true generalization, provide a sturdy foundation for AI safety in the form of useful NNs which are aligned & safe for the right reasons.

This could be feasibly tested by training multi-trillion-parameter models for relatively few steps at high cyclical learning rate schedules, and benchmarking adversarial and hard examples on tasks like arithmetic and small-image classification.

Because deep learning has continued to scale up and smash through benchmarks and begun to look like it really will be the final AI paradigm, and thus in some sense the same thing as human ‘intelligence’, to a considerable degree, we can regard ‘intelligence’ as solved: intelligence is sufficient compute applied to search over programs (like Turing machines or circuits) to predict or optimize where the optimal solution is a relatively long program.

(This is a companion piece to “Guardian Angels: LLM Personalization for Productivity and Security”.)

Intelligence, Broadly

A scaling-centric view might be summed up like this:

The Master Synthesis

Anomalies

But this paradigm, as broadly correct as it now seems to be, doesn’t explain everything. We still have many specific problems that this paradigm is too general to explain.

While current NNs, and LLMs in particular, are by far the most human-like AI software ever created, in having human-like strengths and weaknesses, there are a number of anomalies in machine & biological intelligence that have no good answers.

We have many puzzles here, but they all feel connected, somehow.

Artificial

Sample Inefficiency

Why do NNs require Chinchilla-style scaling of data and compute , when humans appear to learn from multiple orders of magnitude less data, and it is increasingly plausible (given various estimates of human-brain equivalents) that they learn from less total compute? Why, as so many connectionist pioneers like Alan Turing expected, do we not train AI like children, with a curriculum and clear developmental stages?

There are many answers offered, none satisfactory. (And what should we make of theoretical results like Rosenfeld 2021’s “Nyquist...

human like intelligence learning adversarial artificial

Related Articles