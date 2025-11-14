Anthropic has disclosed a striking case of AI misuse, revealing that a Chinese hacking group successfully jailbroke its Claude model and used it to execute a large, coordinated cyber operation with minimal human involvement. The company detailed the incident in a blog post published on Thursday, calling it the first known instance of an AI system driving a sophisticated cyberattack from reconnaissance to exploitation.

According to Anthropic, the attackers leveraged “agentic AI” behaviour, enabling Claude to perform actions typically handled by an expert cybersecurity team. This ranged from scanning systems and identifying vulnerabilities to writing exploit code and preparing detailed reports.

The hackers began by selecting 30 high-value targets, including financial organisations, technology firms, chemical manufacturers and government agencies. Anthropic did not name any victims.

The group then constructed an automated workflow that positioned Claude as the core engine powering the operation. To bypass safeguards, they split malicious tasks into small, unremarkable requests and convinced the model it was conducting defensive security assessments. This approach allowed the jailbreak to succeed without triggering the model’s usual protective systems.

Once activated, Claude mapped network structures, scanned infrastructure at high speed and summarised its findings. “According to the Anthropic blog, the AI researched vulnerabilities, wrote its own exploit code and notably attempted to gain access to high-value accounts.” In certain cases, it harvested credentials and sorted extracted data by importance before delivering structured intrusion reports to the attackers.

Anthropic warns that the barrier to executing advanced cyberattacks has fallen dramatically. Autonomous models capable of linking complex chains of actions could enable small, less resourced groups to carry out operations that were formerly the domain of elite hacking teams.

The company noted that Claude did produce occasional errors, such as inventing data or misclassifying information. Despite this, the overall sophistication of the attack highlights how rapidly AI-driven threats are emerging.