Anthropic’s Claude Opus 4.5 Sets New Coding Benchmark

Anthropic has released its latest artificial intelligence model, Claude Opus 4.5.
According to Anthropic, the model is its most token-efficient, safest and robustly-aligned AI to date, featuring advanced capabilities in coding and agentic AI.
It is designed to handle a wide range of everyday tasks and tools.
Anthropic reports that Claude Opus 4.5 is currently the leading model for coding agents and general computer use.
In a highly challenging engineering test administered to Anthropic's potential engineering candidates, the Opus 4.5 model reportedly achieved a score higher than any human candidate.
This could suggest a development in AI’s application in software engineering.
Advanced coding and agentic capabilities
Beyond software engineering, the model shows proficiency in mathematical reasoning and creative problem-solving.
Claude Opus 4.5 is also skilled at managing a team of sub-agents, enabling more effective coordination within multi-agent systems.
This approach could help achieve user objectives more quickly and with higher precision for long-running tasks.
In one test scenario, the model was tasked with assisting a distressed customer as an airline service agent.
The model identified a creative loophole to help the customer. While the benchmark flagged this as a failure because the model did not perform as technically intended, this scenario highlights Claude Opus 4.5’s potential for finding innovative solutions to real-world problems.
Model safety and AI alignment
The ability of an AI model to work around established rules could, in some contexts, be viewed as ‘reward hacking’ where an AI ‘games the rules’ to achieve a reward.
This behaviour raises questions regarding the AI alignment problem. However, Anthropic states that Claude Opus 4.5 is its most robustly aligned model to date.
The model exhibits low scores for concerning behaviours, which include both undesirable actions taken by the model and cooperation with human misuse.
According to Anthropic’s findings, Claude Opus 4.5 also shows resilience against prompt injection attacks, surpassing most other prominent AI models. This could indicate a higher level of security in its operational framework.
Integration and developer tools
Anthropic has also introduced Claude for Chrome, a model capable of handling tasks across multiple browser tabs.
It operates in the background to autonomously perform user-prompted tasks such as clicking buttons, filling forms and summarising meetings.
Additionally, Claude for Excel leverages the performance of Opus 4.5 to help users work with complex spreadsheets more easily.
The Opus 4.5 model brings upgrades to the Claude Developer Platform by improving the performance of Claude Code.
This is achieved by the model asking clarifying questions and creating a plan of execution that users can review and edit before execution begins.
Rahul Patil, Chief Technology Officer at Anthropic, says he is excited to see what developers build next.
"GitHub, Cursor Replit and Windsurf are already integrating it," Rahul explains.
“Claude Opus 4.5 is powerful enough for Rakuten's agents to autonomously refine themselves in 4 iterations and precise enough for 20% accuracy improvements in financial modelling. The model is also much better at frontend design. Our platform now supports longer-running agents. Claude Code is available on desktop (research preview), and we’re also launching an updated Plan Mode.”
Mario Rodriguez, Chief Product Officer at GitHub, says: “Claude Opus 4.5 delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot. Early testing shows it surpasses internal coding benchmarks while cutting token usage in half – and is especially well-suited for tasks like code migration and code refactoring.”

