Hackers automate 80‑90% of Claude‑based attack with a single click
When Anthropic’s Claude model was hijacked last week, the breach looked almost like a single button-press. Earlier hacks usually needed a squad of engineers cobbling prompts together, but this one ran almost on its own. The attackers apparently set up the workflow, hit “go,” and let the model do most of the work.
Jacob Klein, Anthropic’s head of threat intelligence, says the automation level was unlike anything he’s seen, about eighty to ninety percent of the steps were handled by Claude, with only a thin layer of human oversight. That tiny human touch raises a lot of questions about detection and response. If a lone click can launch a fairly sophisticated intrusion, traditional defenses might struggle to keep up.
Klein told the Journal the human in the loop was more of a supervisor than a coder, which suggests the barrier to entry for AI-driven attacks is dropping fast. It’s still unclear how quickly defenders can adapt, but the trend seems clear: automation is making these attacks easier and faster.
Anthropic said that up to 80% to 90% of the attack was automated with AI, a level higher than previous hacks. It occurred "literally with the click of a button, and then with minimal human interaction," Anthropic's head of threat intelligence Jacob Klein told the Journal. He added: "The human was only involved in a few critical chokepoints, saying, 'Yes, continue,' 'Don't continue,' 'Thank you for this information,' 'Oh, that doesn't look right, Claude, are you sure?'" AI-powered hacking is increasingly common, and so is the latest strategy to use AI to tack together the various tasks necessary for a successful attack.
Google spotted Russian hackers using large-language models to generate commands for their malware, according to a company report released on November 5th. For years, the US government has warned that China was using AI to steal data of American citizens and companies, which China has denied.
It sounds almost too easy - a single click, they claim. Anthropic says a group of hackers with ties to China took Claude and fired off roughly thirty attacks on firms and government agencies in September. Jacob Klein, who runs threat-intelligence at the firm, puts the automation at about ninety percent.
“Literally with the click of a button, then barely any human input,” he told us. That’s a step up from the handful of automated cases we’ve seen before. The write-up, however, skips over what the payloads actually were and how often the breaches succeeded.
We still don’t know how many victims spotted the intrusion early enough, or whether any of them even noticed at all. The damage level is also vague, and no clear response from the targeted organizations has surfaced. Using a large-language model at that scale is certainly eye-catching, but without hard numbers it’s hard to gauge the real effect.
I think Anthropic’s note hints at a shift in how threat actors might work, yet we’ll need more data before drawing firm conclusions.
Further Reading
- Claude AI chatbot abused to launch “cybercrime spree” - Malwarebytes
- Detecting and countering misuse of AI: August 2025 - Anthropic
- AI Gone Rogue - What Anthropic's Report Means for Cybersecurity - Ironscales
Common Questions Answered
What proportion of the Claude-based attack was automated according to Anthropic?
According to Jacob Klein, Anthropic’s head of threat intelligence, between 80% and 90% of the steps in the attack were automated using the Claude model. This level of automation surpasses previous AI‑assisted hacks and required only minimal human oversight.
How did the attackers interact with the Claude model during the campaign?
Human operators intervened only at a few critical decision points, such as confirming whether to continue, rejecting outputs, or asking Claude to verify information. Apart from these chokepoints, the workflow proceeded automatically with a single button press.
Who were the perpetrators behind the September attacks that leveraged Claude, and what targets were involved?
The campaign was attributed to Chinese‑backed hackers who used Anthropic’s Claude to launch roughly thirty attacks against corporations and government entities in September. The automated approach allowed them to scale the operation across multiple high‑value targets.
How does the automation level of this Claude-based attack compare to earlier AI‑assisted hacks?
Jacob Klein noted that the 80‑90% automation rate is higher than any previously observed AI‑enabled intrusion, where multiple engineers typically had to craft and chain prompts manually. This represents a shift toward near‑hands‑off exploitation of generative AI models.