Google's nested learning, based on brain's fast‑slow circuits curbs LLM forgetting
When I first saw the demo, the model seemed to forget yesterday’s facts as soon as it got new data. That’s the old “catastrophic forgetting” problem that has bugged researchers for years. Every time an LLM is updated, its performance can wobble, and teams often end up retraining from scratch or using clunky tricks.
In a recent paper called “Google’s Nested Learning aims to stop LLMs from catastrophic forgetting,” the authors suggest a two-level training setup: a fast-learning layer sits on top of a slower, steadier one. In theory, the slower part holds onto what the model already knows while the fast part soaks up fresh information. Why look to biology?
Neuroscience hints that the brain works on several timescales, keeping core patterns alive while letting less-important details fade. The new method leans on that idea, hoping that mimicking the brain’s timing gives AI a more reliable memory. The quote below shows how the team turned those neural clues into actual code.
How nested learning borrows from the brain Like many machine learning advances, nested learning is inspired by neuroscience. The brain runs at different speeds: fast circuits handle the present, slower ones consolidate important patterns into long-term memory. Most experiences fade quickly; only a few become lasting memories, thanks to neuroplasticity--the brain's ability to rewire itself while preserving essential information.
The authors contrast this with current LLMs, whose knowledge remains limited to their context window or static pretraining. Nested learning treats every part of an AI model--including the optimizer and training algorithm--as memory. Backpropagation stores links between data and errors, and the optimizer's state, like momentum, acts as memory too.
The Continuum Memory System (CMS) splits memory into modules that update at different rates, giving the model temporal depth. HOPE: Nested Learning in practice Google's HOPE architecture puts this to work. HOPE uses long-term memory modules called Titans, which store information based on how surprising it is to the model.
It layers different types of memory and uses CMS blocks for larger context windows. Fast layers process live input, slower layers distill what's important for long-term storage, and the system can adapt its update rules as it learns. This goes beyond the typical "pretrain and freeze" model.
The team tested HOPE on language modeling and reasoning. With models at 1.3 billion parameters trained on 100 billion tokens, HOPE outperformed Transformer++ and newer models like RetNet and DeltaNet.
Can a brain-inspired design actually stop a language model from forgetting? Google’s “nested learning” tries to do just that, borrowing the fast-slow circuit idea from neuroscience so the system has a quick layer for immediate work and a slower one for longer-term storage. The NeurIPS 2025 paper notes that today’s LLMs only keep what fits inside the context window or what’s baked into the static pre-training weights.
Simply making the window bigger or retraining now and then just delays the inevitable loss - like putting a bandage on amnesia. By adding a slower-learning component, the team hopes to carve a memory trace that survives later updates. Their early tests show less catastrophic forgetting, but those results live on the authors’ own benchmarks.
It’s still unclear if the trick will hold up across the messy, varied tasks we see in real deployments. And nobody has really measured the extra compute cost of juggling two learning rates over the long haul. The idea is certainly appealing, yet we’ll need independent checks before saying it will change practice.
Further Reading
- Introducing Nested Learning: A new ML paradigm for continual learning - Google Research Blog
- Google's Nested Learning aims to stop LLMs from catastrophic forgetting - The Decoder
- Nested Learning - No more Forgetfulness! - DCCoder
- The Architecture of the Mind: Google's "Nested Learning" and The Global Race for Continual Intelligence - ExecuteAI
- Google's Nested Learning Explained: The AI Breakthrough That Ends Catastrophic Forgetting - YouTube (Omid Mohebi)
Common Questions Answered
How does Google's nested learning architecture aim to prevent catastrophic forgetting in LLMs?
Nested learning introduces a hierarchical training scheme that separates fast, immediate processing from slower, long‑term consolidation, mirroring the brain's fast‑slow circuits. By consolidating important patterns into a stable memory layer, the model retains previously learned knowledge while still integrating new data.
What neuroscience concepts inspired the design of nested learning for large‑language models?
The approach draws on the brain's fast‑slow circuit motif, where rapid neural activity handles present stimuli and slower circuits consolidate lasting memories through neuroplasticity. This analogy guides the separation of short‑term updates from long‑term knowledge storage in LLMs.
Why do widening the context window or periodic retraining only postpone information loss in current LLMs?
Current LLMs store information solely in the context window and static pre‑training weights, so expanding the window or retraining merely delays the inevitable overwrite of older knowledge. Without a dedicated long‑term memory mechanism, these tactics cannot fundamentally stop catastrophic forgetting.
What evidence does the NeurIPS 2025 paper provide about the effectiveness of nested learning?
The NeurIPS 2025 paper reports that models using nested learning maintain higher performance on previously learned tasks after successive updates, compared to baseline LLMs that suffer noticeable degradation. These results suggest that the fast‑slow consolidation strategy successfully mitigates forgetting.