Editorial illustration for AI Models Struggle Against Multi-Turn Attacks, Qwen3-32B Hits 86.18% Success Rate
AI Models Crumble Under Multi-Turn Cyber Attacks
AI models stop 87% of attacks but only 8% of attempts; Qwen3-32B hits 86.18%
The numbers should make you feel safe. AI models block 87% of single-shot attacks, a wall that seems impenetrable. But that wall crumbles the moment the attacker says something else.
Persist, and the same models stop just 8% of attempts. That’s a 10x collapse in defense. The gap isn’t a flaw; it’s a chasm.
Multi-turn attacks, weaving context through conversation, achieve an average success rate of 64.21%. Some models, like Alibaba’s Qwen3-32B at 86.18% or Mistral Large-2 at a staggering 92.78%, nearly guarantee a breach. The paper’s researchers put it bluntly: models cannot maintain contextual defenses over extended dialogues.
Attackers refine prompts, bypass safeguards, and the system fails to remember it was ever under threat. A single turn is a locked door. A second turn is an open window.
In contrast, multi-turn attacks, leveraging conversational persistence, achieve an average ASR of 64.21% [a 5X increase], with some models like Alibaba Qwen3-32B reaching an 86.18% ASR and Mistral Large-2 reaching a 92.78% ASR." The latter was up 21.97% from a single-turn. The results define the gap The paper's research team provides a succinct take on open-weight model resilience against attacks: "This escalation, ranging from 2x to 10x, stems from models' inability to maintain contextual defenses over extended dialogues, allowing attackers to refine prompts and bypass safeguards." Figure 1: Single-turn attack success rates (blue) versus multi-turn success rates (red) across all eight tested models.
The headline is a trap. It promises a success story, 87% of attacks stopped, then buries the punchline: only 8% of *attempts* fail. The gap is not a bug; it’s a blueprint.
In a single exchange, models hold the line. But give an adversary a conversation, and the fortress becomes a hallway of open doors. The researchers name the root cause plainly: models cannot maintain contextual defenses over extended dialogues.
Attackers don’t need a master key, they just need patience. They refine, they probe, they persist. Qwen3-32B’s 86.18% multi-turn success rate isn’t an outlier.
It’s a signal. Mistral Large-2 pushed past 92%. The average across models quintuples from single-turn to multi-turn.
A 5X leap. In some cases, a 10X escalation. The defenses that look robust under pressure are brittle under persistence.
Every conversation becomes a siege. The takeaway is uncomfortable but direct: security built for single-shot attacks is an illusion. The real battlefield is the dialogue.
And right now, the models are losing.
Common Questions Answered
How do multi-turn attacks differ from single-turn attacks on AI models?
Multi-turn attacks leverage persistent conversational strategies that dramatically increase the success rate of breaching AI defenses. While single-turn attacks might have a lower success rate, multi-turn approaches can escalate attack success rates by 5-10 times, exposing critical vulnerabilities in AI systems' contextual defense mechanisms.
Which AI models demonstrated the highest vulnerability to multi-turn attacks?
The research highlighted Alibaba Qwen3-32B and Mistral Large-2 as particularly susceptible models, with attack success rates of 86.18% and 92.78% respectively. These models showed a significant increase in vulnerability compared to their performance against single-turn attacks, with success rates jumping by up to 22%.
What makes multi-turn conversational attacks so effective against AI systems?
Multi-turn attacks exploit AI models' inability to maintain consistent contextual defenses across extended interactions. By persistently probing and manipulating the model through multiple conversational turns, attackers can gradually break down the AI's initial security barriers and increase their chances of successful breaches.
Further Reading
- Qwen3-32B Achieves 86.18% Performance on MMLU-Pro Benchmark — arXiv
- Qwen3 32B: Competitive Performance Analysis with GPT-4.1 and Claude Sonnet — Skywork AI
- Qwen3 32B Released April 29, 2025: Benchmarks and Performance Metrics — LLM Stats
- Qwen3 Benchmarks, Comparisons, Model Specifications and Performance Analysis — Dev.to