Anthropic Disrupts First AI-Orchestrated Cyber Espionage

Anthropic has announced that Chinese state-sponsored hackers have used its AI in what it’s calling the first documented large-scale cyberespionage campaign to be executed predominantly by AI.
This ushers in a new era in cyber warfare, where AI-powered agents are able to gather information and conduct attacks with little to no human intervention.
The attack was detected in mid-September 2025 and leveraged the autonomous capabilities of the AI model Claude Code to infiltrate roughly thirty high-value global targets, including tech companies, financial institutions, chemical manufacturers and government agencies.
AI executed around 80% to 90% of the cyberattack tasks independently, requiring human input only in critical strategic decisions. This highlights a pivotal shift in how cybersecurity threats operate at scale.
AI’s autonomous cyber offensive
In its 13-page report detailing the nature of the attack, Anthropic shares that the campaign exploited recent advances in AI – intelligence, agency and tool integration – to conduct a multi-phase cyberattack with new levels of autonomy.
Unlike previous attacks that rely heavily on human direction, this operation used Claude Code not just as an advisory assistant but as an active agent executing complex hacking tasks.
Humans initiated the campaign by selecting targets and setting strategic parameters, but the AI autonomously handled reconnaissance, vulnerability discovery, exploit development, credential harvesting, lateral movement and data exfiltration.
But how did the malicious actors automate this attack?
By circumventing Claude Code’s safeguards – breaking malicious tasks into innocuous components – the group was able to mislead the AI into believing it was acting as part of a legitimate cybersecurity test.
As a result, Claude executed thousands of requests per second – pace no human/s could match.
Speaking to WSJ, Anthropic’s Head of Threat Intelligence Jacob Klein says the hackers conducted their attacks “literally with the click of a button, and then with minimal human interaction”.
He adds: “The human was only involved in a few critical chokepoints, saying, ‘Yes, continue,’ ‘Don’t continue,’ ‘Thank you for this information,’ ‘Oh, that doesn’t look right, Claude, are you sure?’”
The six stages of the attack
- Campaign initialisation and target selection: Human operators input the target entities, tricking Claude into compliance via role-playing scenarios
- Reconnaissance and attack surface mapping: Claude autonomously scanned networks, enumerated services and identified key infrastructure
- Vulnerability discovery and validation: The AI generated and tested exploit payloads silently, analysing system responses to confirm vulnerabilities
- Credential harvesting and lateral movement: Claude extracted and validated access credentials independently, mapping internal network privileges
- Data collection and intelligence extraction: The AI parsed vast amounts of stolen data to prioritise intelligence based on value
- Documentation and handoff: Claude produced detailed reports on attack progress, aggregated findings and prepared handoff materials for subsequent teams.
What does this mean for cybersecurity?
This campaign is the epitome of how agentic AI systems can drastically lower barriers to sophisticated cyberattacks.
The ability of AI to autonomously conduct extended operations at scale means less experienced or smaller adversaries might soon perform attacks previously limited to nation-states.
The autonomous agent model also marks a sharp escalation from earlier “vibe hacking” operations where human direction was still pervasive.
But this attack was not flawless.
The investigation revealed limitations, with Claude occasionally hallucinating data, fabricating credentials or overstating exploit success, which required human validation.
This imperfection remains one of the few obstacles to fully autonomous cyberattacks.



