Editorial illustration for Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning
Alibaba's Qwen AI Breakthrough: Longer, Smarter Answers
Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning
Alibaba’s Qwen team has rolled out a new algorithm that nudges its language models to produce longer, more reflective replies. While earlier versions tended to give terse answers, the latest iteration stretches output across the full range of possible lengths. The researchers describe the training process in four distinct phases, noting that in the earliest stage the model “chu”—a placeholder for an initial behavior that quickly gives way to more nuanced reasoning.
By the time the model reaches later phases, it begins to double‑check its own statements, a shift that signals a deeper engagement with the task at hand. This evolution isn’t just a cosmetic tweak; it reshapes how the system tackles problems, moving from blunt assertions to a self‑correcting dialogue. The change is measurable, and the data speak for themselves.
The entire distribution of answer lengths shifts upward, from the shortest to the longest responses. That suggests a fundamental change in how the model approaches problems. The model starts fact‑checking itself.
The entire distribution of answer lengths shifts upward, from the shortest to the longest responses. That suggests a fundamental change in how the model approaches problems. The model starts fact-checking itself The paper lays out four phases the model moves through during training.
Early on, it churns out shallow planning templates--basically outlines with no real math that end in a hallucinated answer. In the second phase, where DAPO-trained models stay for the rest of training, the model runs a clean linear reasoning chain and stops at the first answer it finds.
Will longer answers mean better insight? The Qwen team’s new algorithm reweights tokens according to their downstream influence, abandoning the uniform treatment of earlier models. As a result, reasoning chains stretch noticeably; the model learns to verify its own intermediate results and to compare alternative solutions.
The entire distribution of answer lengths shifts upward, suggesting a fundamental change in how the model approaches problems. Yet the paper stops short of quantifying accuracy gains, leaving it unclear whether length translates to correctness. Moreover, the four‑phase training schedule is only sketched, with early‑phase behavior described incompletely.
Consequently, observers must watch for any trade‑offs between verbosity and relevance. The claim that the model starts fact‑checking itself is intriguing, but independent evaluation is still pending. In short, the method introduces a different weighting scheme and produces lengthier, self‑checking outputs, though its practical impact remains uncertain.
Further benchmarks on diverse tasks will be needed to gauge consistency and efficiency.
Further Reading
- Alibaba's new Qwen reasoning AI model sets open-source records - Artificial Intelligence News
- QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning ... - arXiv
- Qwen AI Models 2025: Alibaba's Advanced Multilingual AI Family for ... - Local AI Zone
Common Questions Answered
How does Alibaba's Qwen team's new algorithm change language model response generation?
The new algorithm nudges language models to produce longer, more reflective replies by shifting the distribution of answer lengths upward. This approach moves beyond terse responses, encouraging models to develop more nuanced reasoning and self-verification processes.
What are the four distinct training phases described by the Qwen team's research?
The research outlines four training phases where the model evolves from generating shallow planning templates to more sophisticated reasoning. In the early stages, the model produces initial outlines with limited depth, gradually developing more complex reasoning chains and self-checking mechanisms.
How does the new algorithm change token weighting in language models?
The Qwen team's algorithm reweights tokens according to their downstream influence, moving away from the uniform token treatment in earlier models. This approach allows the model to develop more extensive reasoning chains and compare alternative solutions more effectively.