Editorial illustration for CoCoNuT paradigm expands residual stream for latent‑space, multi‑path reasoning
CoCoNuT paradigm expands residual stream for...
CoCoNuT paradigm expands residual stream for latent‑space, multi‑path reasoning
Why does the residual stream stop at layers and not tokens? That question sits at the heart of the new CoCoNuT (Chain of Continuous Thought) paradigm — a framework that lets large language models wander through latent space, testing several reasoning paths at once instead of locking into a single chain early. While the idea sounds promising, the authors spot a snag they call the “concept bottleneck.” Each pass overwrites intermediate hidden states, so facts computed earlier can disappear as the model digs deeper.
The effect shows up in the numbers: on HotpotQA, vanilla CoCoNuT lands at 10.4 % exact match, actually lagging behind a standard Chain‑of‑Thought baseline’s 11.0 %. GSM8K performance also falls off as curriculum depth grows. To patch the leak, the paper introduces Adaptive Gated Continuous Latent Reasoning (AGCLR).
It adds a Gated Concept Stream – a persistent memory governed by three learned gates: write, read and forget. Tested on GSM8K, HotpotQA and ProsQA with GPT‑2 as the backbone, AGCLR consistently nudges scores upward, directly tackling the bottleneck. The code is publicly released.
The CoCoNuT (Chain of Continuous Thought) paradigm~\cite{hao2024coconut} extends this by enabling models to reason in latent space, exploring multiple reasoning paths simultaneously rather than committing to a single chain early on. However, we identify a limitation we term the \textbf{concept bottleneck}. At each reasoning pass, intermediate hidden states are overwritten, causing the model to lose critical facts computed in earlier steps as reasoning depth increases.
On HotpotQA, vanilla CoCoNuT (10.4\% EM) fails to improve over the CoT baseline (11.0\% EM), and performance degrades with curriculum depth on GSM8K. To address this, we propose \textbf{AGCLR} (Adaptive Gated Continuous Latent Reasoning), which augments CoCoNuT with a \textit{Gated Concept Stream}. A persistent residual memory maintained across all reasoning passes, controlled by three learned gates: a \textit{write} gate that commits intermediate facts to memory, a \textit{read} gate that retrieves relevant prior states, and a \textit{forget} gate that prunes irrelevant context.
Why this matters
CoCoNuT pushes LLM reasoning beyond a single chain, letting models wander through latent space and keep several hypotheses alive at once. For developers, that means new APIs could expose a richer set of intermediate representations, potentially reducing the need for hand‑crafted prompt engineering. Founders may see a path to products that handle ambiguous queries without committing prematurely, but the paper flags a “concept bottleneck” that could throttle the breadth of simultaneous paths.
Researchers will need to probe whether expanding the residual stream across tokens truly scales, or if memory constraints will limit practical deployment. Is the added flexibility worth the extra compute? The authors suggest the bottleneck may curtail continuous latent reasoning, leaving it unclear whether the approach can maintain performance on longer, more complex tasks.
Our take: the idea is intriguing, yet we remain cautious until empirical results demonstrate that the bottleneck can be mitigated without sacrificing speed or accuracy. Until then, teams should weigh the trade‑offs carefully before building critical systems around this paradigm.
Further Reading
- Training Large Language Models to Reason in a Continuous Latent Space - ICLR 2025
- Why Limit the Residual Stream to Layers and Not Tokens? Persistent ... - OpenReview
- Why Limit the Residual Stream to Layers and Not ... - arXiv
- Reasoning in Continuous Latent Space: COCONUT & Recurrent ... - YouTube
- Chain Of Continuous Thought (Coconut) - Gonzo ML - Substack