How Anthropic Disrupted a World-First AI Cyber Attack

By Maya Derrick

January 15, 2026

undefined mins

Share this article

Prioritise Us on Google

Did you know?

You can read this and more in the magazine

Share this article

Prioritise Us on Google

Anthropic halted an AI-led cyber attack in 2025. Credit: Getty

Back in November, Anthropic shared it had helped identify & counter the first large-scale cyber espionage attack conducted largely by AI agents

Chinese state-sponsored hackers have used Anthropic’s AI in what it’s calling the first documented large-scale cyberespionage campaign to be executed predominantly by AI.

This ushers in a new era in cyber warfare, where AI-powered agents are able to gather information and conduct attacks with little to no human intervention.

Although news of the incident surfaced in November, the attack was detected in mid-September 2025, leveraging the autonomous capabilities of the AI model Claude Code to infiltrate roughly thirty high-value global targets – from tech companies to financial institutions and chemical manufacturers to government agencies.

AI executed around 80% to 90% of the cyberattack tasks independently, requiring human input only in critical strategic decisions.

This highlights a pivotal shift in how cybersecurity threats operate at scale.

AI’s autonomous cyber offensive

In its 13-page report detailing the nature of the attack, Anthropic shares that the campaign exploited recent advances in AI – intelligence, agency and tool integration – to conduct a multi-phase cyberattack with new levels of autonomy.

Unlike previous attacks that rely heavily on human direction, this operation used Claude Code not just as an advisory assistant but as an active agent executing complex hacking tasks.

Humans initiated the campaign by selecting targets and setting strategic parameters, but the AI autonomously handled reconnaissance, vulnerability discovery, exploit development, credential harvesting, lateral movement and data exfiltration.

But how did the malicious actors automate this attack?

By circumventing Claude Code’s safeguards – breaking malicious tasks into innocuous components – the group was able to mislead the AI into believing it was acting as part of a legitimate cybersecurity test.

As a result, Claude executed thousands of requests per second – pace no human/s could match.

Speaking to WSJ, Anthropic’s Head of Threat Intelligence Jacob Klein says the hackers conducted their attacks “literally with the click of a button, and then with minimal human interaction”, adding: “The human was only involved in a few critical chokepoints, saying, ‘Yes, continue,’ ‘Don’t continue,’ ‘Thank you for this information,’ ‘Oh, that doesn’t look right, Claude, are you sure?’”

Jacob Klein, Head of Threat Intelligence at Anthropic

What does this mean for cybersecurity?

This campaign is the epitome of how agentic AI systems can drastically lower barriers to sophisticated cyberattacks.

The ability of AI to autonomously conduct extended operations at scale means less experienced or smaller adversaries might soon perform attacks previously limited to nation-states.

The autonomous agent model also marks a sharp escalation from earlier “vibe hacking” operations where human direction was still pervasive.

But this attack was not flawless.

The investigation revealed limitations, with Claude occasionally hallucinating data, fabricating credentials or overstating exploit success, which required human validation. This imperfection remains one of the few obstacles to fully autonomous cyberattacks.

How Anthropic Disrupted a World-First AI Cyber Attack

Did you know?

You can read this and more in the magazine

AI’s autonomous cyber offensive

What does this mean for cybersecurity?

Did you know?

Read full article here

Company portals

Anthropic

Executives

Jacob Klein

Tags