What are the real problems of continual learning?

matt_d1 pts0 comments

What are the real problems of continual learning?

Infinite Faculty

SubscribeSign in

What are the real problems of continual learning?<br>Reflections on catastrophic interference, plasticity, and learning for the future in the era of large language models

Andrew Lampinen<br>May 29, 2026

Share

Lately, there’s been a surge of interest in continual learning. In particular, there’s been an increasing sense that continual learning is one of the areas where the gap between humans and AI is largest.1 In this post, I want to explain what continual learning is, and some of the past perspectives on it — including how I think the field focused on what in retrospect turned out to be the wrong problems. I’ll then describe what I think are the actual problems that remain, why humans are still superior to AI, and why I think continual learning is so important.

What is continual learning?

In short, continual learning is the ability of a system to keep improving throughout its existence — just as humans can learn new skills and knowledge throughout our lives. This is not an inherent feature of most contemporary AI systems; if you have one conversation with a language model, and then have another conversation on that topic, the model will have no recollection of the first one or what you explained during it (unless the AI writes notes for itself, like some memory systems allow). If you work with a language model to write a paper, you will have to put the first paper in context when you want to work on a follow-up. Why don’t these systems support continual learning? To understand this question, let’s start from earlier perspectives.

Catastrophic interference: the original problem

From the early eras of neural network research, it was noted that these networks exhibit catastrophic interference (or catastrophic forgetting) — when the network is trained on a new task, it catastrophically degrades the network’s ability to perform the tasks it was trained on previously. Thus, the networks were generally trained on data that were constant over training, or sampled uniformly (IID), rather than a sequence of tasks. This was seen to pose a challenge for using neural networks as cognitive models, since human development and education is fundamentally sequential.<br>This challenge provided one motivation for theories of the role of the human hippocampus (episodic memory system) as allowing for rapid learning that complemented the cortical system — if the hippocampus can rapidly store a memory, it can then be learned by the cortical system more gradually as it is interleaved with other experiences. This type of interleaving makes the data distribution closer to IID, and thus reduces the problem of catastrophic interference. The idea of replaying past experiences to smooth the data distribution has been very influential in subsequent machine learning works, for example in reinforcement learning.<br>However, machine learning approaches to replay have generally relied on storing veridical experiences; as such, they tended to seem impractical as the tasks being learned became more numerous and complex. This problem led to an explosion of approaches trying to address catastrophic interference through other changes to architectures or learning objectives — for example, various approaches that preserve weights proportional to their importance to prior tasks, or prevent gradients from interfering with prior tasks, so that learning will occur where it interferes the least.

Is interference as catastrophic as it seems?

However, various recent works have argued that “interference” in continual learning is not quite as catastrophic as it seems. Often, the knowledge of earlier tasks is preserved within the model’s representations in some sense, and can be recovered relatively easily. For example, one paper finds that internal representations often preserve relatively high linear decodability of earlier task information, even when performance degrades. Several other papers have suggested similarly that interference is strongest at the readout layers, while earlier layer representations preserve information about other tasks — thus allowing relatively easy recovery of earlier tasks by retraining the output layers. On their own, these findings do not completely resolve the interference problem — but they do suggest it may not be fundamental.

Loss of plasticity: forward interference

More recently, there has been a newer perspective on a different type of interference: loss of plasticity. While typically catastrophic forgetting works backwards (tasks learned later interfere with earlier tasks), loss of plasticity is instead about forward interference: how earlier learning impairs the ability of the network to learn later tasks. Evidently, for a continual learner this type of interference is just as bad as catastrophic forgetting; a continual learning system cannot lose its ability to learn over time.<br>However, again, the problems are not universal....

learning continual interference tasks catastrophic earlier

Related Articles