Microsoft's Bid to Fix the Cloud Complexity Crisis

Eight in 10 (84%) organisations have reported increased cloud complexity, with 69% saying it is outpacing their current operating.
That's according to the 2026 Microsoft Azure Agentic AI in Cloud Operations report, which surveyed 250 IT decision-makers.
To fix any operating model, organisations need observability.
âIt [observability] provides the real-time understanding of system behaviour that agents depend on to reason, adapt and act,â says Brendan Burns, Technical Fellow and Corporate Vice President of Azure Cloud Native and Management Platform at Microsoft.
âWithout a connected view across signals, even the most advanced agents lack the context required to operate reliably.â
Introducing Azure Copilot Observability Agent
To address this gap, Microsoft’s Azure Copilot Observability Agent is a new AI-powered capability built on Microsoft Azure Monitor that provides an overview of signals across agents, applications, infrastructure and services.
The agent is designed to help cloud operations teams diagnose and resolve incidents faster as software systems become more autonomous and interconnected.
Microsoft says its new agent correlates logs, metrics, traces, topology and operational context across environments, giving operators a clear view of what is happening while teams deploy AI models, models and APIs.
The launch reflects a broader shift from reactive cloud management towards agentic operations where AI systems continuously interpret signals, recommend remediation and help improve system resilience over time.
“As software becomes increasingly agentic, the challenge is no longer just managing greater scale and complexity,” Brendan explains.
“Operators must also contend with systems that evolve faster, act more autonomously and interact across an expanding network of dependencies.”
Proven impact: reclaiming hundreds of engineering hours
The biggest value in the Observability Agent is “speed”, according to Narmada Krishnaswamy, Head of KPMG Audit Application Support and Operations, who has been trialling the product.
“The Observability Agent helps us resolve incidents faster and reduce operational overhead by turning logs, metrics and traces into plain English insights,” she says.
âThese agents run deep investigations and provide remediation recommendations almost immediately, compared to hours or even days previously.
âSince adopting these capabilities, weâve reclaimed an estimated 250 engineering hours monthly that are now redirected toward supporting new applications and features. We can use natural language to detect, diagnose and remediate issues faster than ever before.â
Similarly, it is saving time for Microsoft Security partner Ontinue, which has more than 290 employees around the world dedicated to helping customers improve their cybersecurity.
âAzure Copilotâs Observability Agent helps us move faster from signal to insight,â explains Theus Hossman, Chief Technology Officer at Ontinue.
âBy bringing together our telemetry and guiding us toward likely root causes, it reduces the time and effort needed to investigate incidents and keeps our teams focused on what matters most.â
The agentic feedback loop
In this agentic model, system signals trigger autonomous agents to interpret, act and learn. This creates a powerful feedback loop: every operational cycle actively improves system resilience and efficiency.
Realising this potential requires a tight pipeline connecting insight to action â from diagnosis to automated remediation.
However, as AI agents take the wheel, robust governance becomes paramount.
Microsoft is also emphasising governance, auditability and human oversight as essential guardrails as systems take on more operational responsibility.
Integrating strict guardrails, thorough auditability and strategic human oversight ensures these scaling systems remain safe, trusted and perfectly aligned with organisational intent.


