Skip to main content
Qwen3-Coder-Next and Claude-Opus-4.5 logos side-by-side, representing a coding benchmark comparison [vertu.com](https://vertu

Editorial illustration for Qwen3-Coder-Next: 10× throughput beats Claude‑Opus‑4.5 on SecCodeBench

Qwen3-Coder-Next: Open Source AI Beats Top Coding Models

Qwen3-Coder-Next: 10× throughput beats Claude‑Opus‑4.5 on SecCodeBench

3 min read

The new Qwen3‑Coder‑Next model arrives with a claim that’s hard to ignore: an open‑source, ultra‑sparse architecture that can handle repository‑wide tasks at ten times the throughput of comparable systems. For developers who spend hours sifting through codebases, that speed promise feels almost tangible. Yet raw performance isn’t the only metric that matters when you hand a model a codebase riddled with flaws.

Security‑focused evaluations have become a litmus test for whether a tool can be trusted in production pipelines. SecCodeBench, a benchmark designed to measure a model’s ability to spot and fix vulnerabilities, puts that pressure squarely on the table. In this context, Qwen3‑Coder‑Next’s results are noteworthy—not just because it runs faster, but because it claims to keep its eyes on the security side of the equation, even when the prompt omits explicit safety cues.

The numbers that follow illustrate how the model stacks up against the well‑known Claude‑Opus‑4.5 in those high‑stakes scenarios.

Crucially, the model demonstrates robust inherent security awareness. On SecCodeBench, which evaluates a model's ability to repair vulnerabilities, Qwen3-Coder-Next outperformed Claude-Opus-4.5 in code generation scenarios (61.2% vs. Notably, it maintained high scores even when provided with no security hints, indicating it has learned to anticipate common security pitfalls during its 800k-task agentic training phase.

In multilingual multilingual security evaluations, the model also demonstrated a competitive balance between functional and secure code generation, outperforming both DeepSeek-V3.2 and GLM-4.7 on the CWEval benchmark with a func-sec@1 score of 56.32%. Challenging the proprietary giants The release represents the most significant challenge to the dominance of closed-source coding models in 2026. By proving that a model with only 3B active parameters can navigate the complexities of real-world software engineering as effectively as a "giant," Alibaba has effectively democratized agentic coding.

The "aha!" moment for the industry is the realization that context length and throughput are the two most important levers for agentic success. A model that can process 262k tokens of a repository in seconds and verify its own work in a Docker container is fundamentally more useful than a larger model that is too slow or expensive to iterate. As the Qwen team concludes in their report: "Scaling agentic training, rather than model size alone, is a key driver for advancing real-world coding agent capability".

With Qwen3-Coder-Next, the era of the "mammoth" coding model may be coming to an end, replaced by ultra-fast, sparse experts that can think as deeply as they can run.

The Qwen3‑Coder‑Next release puts an ultra‑sparse, open‑source model into the hands of “vibe” coders, promising ten‑fold higher throughput on repository‑scale tasks. Alibaba’s Qwen team has positioned itself as a notable contributor to the open‑source AI scene, repeatedly delivering models that claim to match or exceed the output of leading proprietary systems. On SecCodeBench—a benchmark that measures a model’s capacity to fix vulnerabilities—the new model posted a 61.2 % success rate, outpacing Claude‑Opus‑4.5 in the reported code‑generation tests. Moreover, the model retained strong scores even when security cues were omitted, suggesting an inherent awareness of safe coding practices.

Nevertheless, the data stop at a single benchmark and a single comparative figure; broader applicability across diverse codebases and real‑world development pipelines has not been demonstrated. It is also unclear how the reported throughput gains translate into latency or resource consumption on typical hardware. While the results are encouraging, further independent evaluation will be needed to confirm whether the model’s performance holds up outside the controlled testing environment.

Further Reading

Common Questions Answered

How does Qwen3-Coder-Next perform on security vulnerability detection and repair?

The model demonstrated exceptional performance on SecCodeBench, achieving a 61.2% success rate in code generation scenarios that involve vulnerability repair. Notably, it maintained high scores even without explicit security hints, suggesting an inherent understanding of potential security pitfalls developed during its 800k-task agentic training phase.

What makes Qwen3-Coder-Next unique in terms of performance and architecture?

Qwen3-Coder-Next features an ultra-sparse architecture that claims to deliver 10× throughput compared to systems like Claude-Opus-4.5. The model is designed for repository-wide tasks and represents Alibaba's continued effort to create open-source AI models that can compete with or exceed proprietary systems.

What is significant about the model's multilingual security evaluations?

The model demonstrated robust performance across multilingual security evaluations, indicating its ability to detect and repair vulnerabilities across different programming languages. This multilingual capability suggests a sophisticated understanding of code security that transcends language-specific boundaries.