Microsoft's Bid to Fix the Cloud Complexity Crisis

Share this article
Share this article
Prioritise Us on Google
The new tool “provides the real-time understand of system behaviour that agents depend on to reason”, says Microsoft’s Brendan Burns. Credit: Microsoft
Microsoft's Azure Cloud Observability Agent correlates logs and metrics into clear insights, saving early adopters like KPMG 250 engineering hours monthly

Eight in 10 (84%) organisations have reported increased cloud complexity, with 69% saying it is outpacing their current operating.

That's according to the 2026 Microsoft Azure Agentic AI in Cloud Operations report, which surveyed 250 IT decision-makers.

To fix any operating model, organisations need observability.

“It [observability] provides the real-time understanding of system behaviour that agents depend on to reason, adapt and act,” says Brendan Burns, Technical Fellow and Corporate Vice President of Azure Cloud Native and Management Platform at Microsoft. 

“Without a connected view across signals, even the most advanced agents lack the context required to operate reliably.”

Youtube Placeholder

Introducing Azure Copilot Observability Agent

To address this gap, Microsoft’s Azure Copilot Observability Agent is a new AI-powered capability built on Microsoft Azure Monitor that provides an overview of signals across agents, applications, infrastructure and services. 

The agent is designed to help cloud operations teams diagnose and resolve incidents faster as software systems become more autonomous and interconnected.

Microsoft says its new agent correlates logs, metrics, traces, topology and operational context across environments, giving operators a clear view of what is happening while teams deploy AI models, models and APIs.

The launch reflects a broader shift from reactive cloud management towards agentic operations where AI systems continuously interpret signals, recommend remediation and help improve system resilience over time.

“As software becomes increasingly agentic, the challenge is no longer just managing greater scale and complexity,” Brendan explains. 

“Operators must also contend with systems that evolve faster, act more autonomously and interact across an expanding network of dependencies.”

Proven impact: reclaiming hundreds of engineering hours

The biggest value in the Observability Agent is “speed”, according to Narmada Krishnaswamy, Head of KPMG Audit Application Support and Operations, who has been trialling the product.

“The Observability Agent helps us resolve incidents faster and reduce operational overhead by turning logs, metrics and traces into plain English insights,” she says. 

Narmada Krishnaswamy, Head of Audit Application Support and Operations, KPMG

“These agents run deep investigations and provide remediation recommendations almost immediately, compared to hours or even days previously. 

“Since adopting these capabilities, we’ve reclaimed an estimated 250 engineering hours monthly that are now redirected toward supporting new applications and features. We can use natural language to detect, diagnose and remediate issues faster than ever before.”

Similarly, it is saving time for Microsoft Security partner Ontinue, which has more than 290 employees around the world dedicated to helping customers improve their cybersecurity.

“Azure Copilot’s Observability Agent helps us move faster from signal to insight,” explains Theus Hossman, Chief Technology Officer at Ontinue. 

Theus Hossman, Chief Technology Officer, Ontinue

“By bringing together our telemetry and guiding us toward likely root causes, it reduces the time and effort needed to investigate incidents and keeps our teams focused on what matters most.”

The agentic feedback loop

In this agentic model, system signals trigger autonomous agents to interpret, act and learn. This creates a powerful feedback loop: every operational cycle actively improves system resilience and efficiency. 

Realising this potential requires a tight pipeline connecting insight to action – from diagnosis to automated remediation.

However, as AI agents take the wheel, robust governance becomes paramount. 

Microsoft is also emphasising governance, auditability and human oversight as essential guardrails as systems take on more operational responsibility. 

Integrating strict guardrails, thorough auditability and strategic human oversight ensures these scaling systems remain safe, trusted and perfectly aligned with organisational intent.

Company portals

Executives