Google researcher points to a brain‑inspired circuit diagram on a screen, with colorful fast‑slow pathways and LLM code.

Google's nested learning, based on brain's fast‑slow circuits curbs LLM forgetting

November 23, 2025 • 3 min read

When I first saw the demo, the model seemed to forget yesterday’s facts as soon as it got new data. That’s the old “catastrophic forgetting” problem that has bugged researchers for years. Every time an LLM is updated, its performance can wobble, and teams often end up retraining from scratch or using clunky tricks.

In a recent paper called “Google’s Nested Learning aims to stop LLMs from catastrophic forgetting,” the authors suggest a two-level training setup: a fast-learning layer sits on top of a slower, steadier one. In theory, the slower part holds onto what the model already knows while the fast part soaks up fresh information. Why look to biology?

Neuroscience hints that the brain works on several timescales, keeping core patterns alive while letting less-important details fade. The new method leans on that idea, hoping that mimicking the brain’s timing gives AI a more reliable memory. The quote below shows how the team turned those neural clues into actual code.

How nested learning borrows from the brain Like many machine learning advances, nested learning is inspired by neuroscience. The brain runs at different speeds: fast circuits handle the present, slower ones consolidate important patterns into long-term memory. Most experiences fade quickly; only a few become lasting memories, thanks to neuroplasticity--the brain's ability to rewire itself while preserving essential information.

The authors contrast this with current LLMs, whose knowledge remains limited to their context window or static pretraining. Nested learning treats every part of an AI model--including the optimizer and training algorithm--as memory. Backpropagation stores links between data and errors, and the optimizer's state, like momentum, acts as memory too.

The Continuum Memory System (CMS) splits memory into modules that update at different rates, giving the model temporal depth. HOPE: Nested Learning in practice Google's HOPE architecture puts this to work. HOPE uses long-term memory modules called Titans, which store information based on how surprising it is to the model.

It layers different types of memory and uses CMS blocks for larger context windows. Fast layers process live input, slower layers distill what's important for long-term storage, and the system can adapt its update rules as it learns. This goes beyond the typical "pretrain and freeze" model.

The team tested HOPE on language modeling and reasoning. With models at 1.3 billion parameters trained on 100 billion tokens, HOPE outperformed Transformer++ and newer models like RetNet and DeltaNet.

Google's Nested Learning aims to stop LLMs from catastrophic forgetting - THE DECODER

Related Topics: #Google #nested learning #catastrophic forgetting #large-language models #LLM #neuroscience #fast‑slow circuits #backpropagation #optimizer

Can a brain-inspired design actually stop a language model from forgetting? Google’s “nested learning” tries to do just that, borrowing the fast-slow circuit idea from neuroscience so the system has a quick layer for immediate work and a slower one for longer-term storage. The NeurIPS 2025 paper notes that today’s LLMs only keep what fits inside the context window or what’s baked into the static pre-training weights.

Simply making the window bigger or retraining now and then just delays the inevitable loss - like putting a bandage on amnesia. By adding a slower-learning component, the team hopes to carve a memory trace that survives later updates. Their early tests show less catastrophic forgetting, but those results live on the authors’ own benchmarks.

It’s still unclear if the trick will hold up across the messy, varied tasks we see in real deployments. And nobody has really measured the extra compute cost of juggling two learning rates over the long haul. The idea is certainly appealing, yet we’ll need independent checks before saying it will change practice.

Common Questions Answered

How does Google's nested learning architecture aim to prevent catastrophic forgetting in LLMs?

Nested learning introduces a hierarchical training scheme that separates fast, immediate processing from slower, long‑term consolidation, mirroring the brain's fast‑slow circuits. By consolidating important patterns into a stable memory layer, the model retains previously learned knowledge while still integrating new data.

What neuroscience concepts inspired the design of nested learning for large‑language models?

The approach draws on the brain's fast‑slow circuit motif, where rapid neural activity handles present stimuli and slower circuits consolidate lasting memories through neuroplasticity. This analogy guides the separation of short‑term updates from long‑term knowledge storage in LLMs.

Why do widening the context window or periodic retraining only postpone information loss in current LLMs?

Current LLMs store information solely in the context window and static pre‑training weights, so expanding the window or retraining merely delays the inevitable overwrite of older knowledge. Without a dedicated long‑term memory mechanism, these tactics cannot fundamentally stop catastrophic forgetting.

What evidence does the NeurIPS 2025 paper provide about the effectiveness of nested learning?

The NeurIPS 2025 paper reports that models using nested learning maintain higher performance on previously learned tasks after successive updates, compared to baseline LLMs that suffer noticeable degradation. These results suggest that the fast‑slow consolidation strategy successfully mitigates forgetting.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

Google's nested learning, based on brain's fast‑slow circuits curbs LLM forgetting

Further Reading

Common Questions Answered

How does Google's nested learning architecture aim to prevent catastrophic forgetting in LLMs?

What neuroscience concepts inspired the design of nested learning for large‑language models?

Why do widening the context window or periodic retraining only postpone information loss in current LLMs?

What evidence does the NeurIPS 2025 paper provide about the effectiveness of nested learning?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds

Further Reading

Related Reading

OpenAI says AI saves knowledge workers 40‑80 minutes; use yields five‑fold gains

Grok Chat: AI for debugging, building, testing web apps with voice and images

Samsung adds Vision AI Companion, an AI Bixby, to TVs for real‑time queries

Google AI lets shoppers call stores, browse 50 B listings, get side‑by‑side charts

Google expands AI partnership with Tel Aviv University, infrastructure for Gemma

Common Questions Answered

How does Google's nested learning architecture aim to prevent catastrophic forgetting in LLMs?

What neuroscience concepts inspired the design of nested learning for large‑language models?

Why do widening the context window or periodic retraining only postpone information loss in current LLMs?

What evidence does the NeurIPS 2025 paper provide about the effectiveness of nested learning?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds