How Anthropic’s New Sonnet 4.5 Redefines AI for Developers

Share this article
Share this article
Prioritise Us on Google
Dario Amodei, CEO of Anthropic, launches Claude Sonnet 4.5
Anthropic’s Claude Sonnet 4.5 sets a new benchmark by excelling in coding, multi-hour task execution and developer tools with impressive benchmark results

Software development is emerging as a key front in the AI race, with models increasingly able to generate code and perform system-level operations autonomously.

Anthropic has now introduced Claude Sonnet 4.5, a model built to handle software engineering and computer control workflows.

The launch arrives alongside updates such as new checkpoints in Claude Code, Anthropic’s command-line programming assistant, as well as enhanced features across its consumer-facing apps.

Also included is the Claude Agent SDK, the core infrastructure used by Anthropic to power tools like Claude Code.

The capabilities of Claude Sonnet 4.5

The toolkit equips developers with frameworks for managing memory across workflows, enforcing permission controls and orchestrating collaboration between multiple agents.

“Claude Sonnet 4.5 is our most powerful model to date.”

Anthropic

Anthropic has opened this infrastructure to developers, enabling them to design and deploy their own custom agent systems.

“Code is everywhere. It runs every application, spreadsheet and software tool you use,” Anthropic says.

“Being able to use those tools and reason through hard problems is how modern work gets done.”

Claude Sonnet 4.5 keeps pricing consistent at US$3 per million input tokens and $15 per million output tokens, aligning with the rates set for Claude Sonnet 4.

The model is accessible via the Claude API under the identifier claude-sonnet-4-5.

Claude Sonnet 4.5’s scoring and industry impact

The model achieves a score of 61.4% on OSWorld, a benchmark that tests AI systems on computer tasks. This compares to 42.2% for Claude Sonnet 4, released four months earlier. 

Claude Sonnet 4.5’s results | Credit: Anthropic

On SWE-bench Verified, a benchmark assessing advanced coding performance, Claude Sonnet 4.5 ranks as the leading model among those tested.

Anthropic notes that it can sustain attention on multi-step tasks for more than thirty hours.

The company has also integrated these capabilities into its Chrome extension, rolled out in August to Max subscribers who registered via a waitlist.

Early adopters have since begun sharing outcomes from real-world deployments.

Mario Rodriguez, CPO at GitHub

Mario Rodriguez, Chief Product Officer (CPO) at GitHub says: “Claude Sonnet 4.5 amplifies GitHub Copilot’s core strengths.

“Our initial evals show significant comprehension – enabling Copilot’s agentic experiences to handle complex, codebased-spanning tasks better.”

Eric Wendelin, Tech Lead, Gen AI for Developer Productivity at Netflix

Eric Wendelin, Tech Lead, Gen AI for Developer Productivity at Netflix, adds: “Claude Sonnet 4.5 is excellent at software development tasks, learning our codebased patterns to deliver precise implementations. 

“It handles everything from debugging to architecture with deep contextual understanding, transforming our development velocity.”

Experts across finance, law, medicine, and STEM report that Sonnet 4.5 demonstrates significantly stronger domain-specific knowledge and reasoning skills than earlier models, including Opus 4.1.

Finance experts findings on Sonnet 4.5 | Credit: Anthropic

The matter of safety measures 

Anthropic is rolling out Claude Sonnet 4.5 under its AI Safety Level 3 protections, a framework that pairs advanced capabilities with strict safeguards.

As part of this, the company has deployed classifiers – filters designed to identify prompts and outputs connected to chemical, biological, radiological, and nuclear weapons.

These systems can also flag benign content, but Anthropic reports a tenfold reduction in such false positives since they were first introduced, and a twofold improvement since the launch of Claude Opus 4 in May.

Automated evaluations further show fewer instances of behaviours such as sycophancy, deception, power-seeking, and the reinforcement of delusional thinking.

Anthropic describes Claude Sonnet 4.5 as “our most aligned frontier model yet,” noting that “Claude’s improved capabilities and our extensive safety training have allowed us to substantially improve the model’s behaviour”.

Claude Sonnet 4.5’s misaligned behaviour scores | Credit: Anthropic

To date, Anthropic has added code execution and file creation features directly into its consumer chat applications.

This allows users to build spreadsheets, presentations, and documents without leaving the interface.

The Claude API also incorporates a context editing function and a memory system, enabling agents to sustain operations over longer timeframes.

Youtube Placeholder

Anthropic has also launched a research preview called Imagine with Claude, available to Max subscribers for a five-day trial.

The preview showcases Claude Sonnet 4.5 generating software directly from user prompts, without relying on prebuilt functions or preset code.

Anthropic says it “built Claude Code because the tool we wanted didn’t exist yet”. 

The company adds: “The Agent SDK gives you the same foundation to build something just as capable for whatever problem you’re solving.” 

Company portals

Executives