AI models stop 87% of attacks but only 8% of attempts; Qwen3‑32B hits 86.18%
Why does this matter? Because the headline numbers—87 % of single‑turn attacks blocked, yet a mere 8 % success rate when attackers keep talking—hint at a deeper vulnerability in today’s conversational AIs. Researchers measured attack success rate (ASR) across a suite of large language models, first testing one‑off prompts that the systems largely deflect.
The picture flips when the same adversary strings together multiple exchanges, exploiting the model’s memory of the conversation. While single‑turn defenses look solid, the persistence of dialogue opens a backdoor that many models fail to seal. The study compares dozens of models, noting that some still hold up better than others.
It also quantifies the jump in ASR, showing a five‑fold rise once the attack spans several turns. This gap isn’t just a statistic; it signals that conversational continuity can turn a modest breach into a serious exploit, especially for models that previously seemed robust. The following data spell out exactly how far the numbers swing.
In contrast, multi-turn attacks, leveraging conversational persistence, achieve an average ASR of 64.21% [a 5X increase], with some models like Alibaba Qwen3-32B reaching an 86.18% ASR and Mistral Large-2 reaching a 92.78% ASR." The latter was up 21.97% from a single-turn. The results define the gap The paper's research team provides a succinct take on open-weight model resilience against attacks: "This escalation, ranging from 2x to 10x, stems from models' inability to maintain contextual defenses over extended dialogues, allowing attackers to refine prompts and bypass safeguards." Figure 1: Single-turn attack success rates (blue) versus multi-turn success rates (red) across all eight tested models.
Is a single blocked prompt enough? The data suggest otherwise. Open‑weight models stop roughly 87 % of isolated malicious requests, yet when attackers probe across ten turns the success rate climbs to about 92 %, leaving only 8 % of attempts thwarted.
This disparity—between single‑turn benchmarks and multi‑turn persistence—defines a gap most enterprises apparently overlook. Multi‑turn attacks raise the average attack success rate to 64.21 %, a five‑fold increase over single‑turn figures. Notably, Alibaba’s Qwen3‑32B attains an 86.18 % success rate, while Mistral Large‑2 reaches 92.78 %, the latter improving by 21.97 % from its single‑turn performance.
The numbers illustrate that higher ASR does not guarantee robustness against conversational probing. Yet it remains unclear whether current defensive strategies can close this gap without sacrificing usability. As the findings show, benchmark scores may mask real‑world vulnerabilities, and further work will be needed to align model evaluations with persistent attack scenarios.
Further Reading
- Qwen3-32B Achieves 86.18% Performance on MMLU-Pro Benchmark - arXiv
- Qwen3 32B: Competitive Performance Analysis with GPT-4.1 and Claude Sonnet - Skywork AI
- Qwen3 32B Released April 29, 2025: Benchmarks and Performance Metrics - LLM Stats
- Qwen3 Benchmarks, Comparisons, Model Specifications and Performance Analysis - Dev.to
Common Questions Answered
What is the difference in attack success rate between single-turn and multi-turn attacks on open-weight models?
Single-turn attacks are blocked about 87% of the time, resulting in a low success rate, whereas multi-turn attacks raise the average attack success rate to 64.21%, a five‑fold increase. This shows that conversational persistence dramatically reduces model resilience.
How did Alibaba's Qwen3‑32B perform in multi‑turn attack scenarios?
Qwen3‑32B recorded an attack success rate of 86.18% during multi‑turn attacks, indicating it is highly vulnerable when adversaries exploit the model's memory across exchanges. This figure is among the highest reported in the study.
What attack success rate did Mistral Large‑2 achieve, and how does it compare to its single‑turn performance?
Mistral Large‑2 reached a 92.78% attack success rate in multi‑turn tests, which is 21.97% higher than its single‑turn success rate. This underscores a significant drop in security when the model is engaged over multiple conversational turns.
According to the article, what percentage of malicious attempts are still thwarted after ten conversational turns?
After ten turns, only about 8% of malicious attempts are successfully blocked, meaning roughly 92% of attacks succeed. This stark contrast highlights the inadequacy of relying solely on single‑turn defenses.