Why Cloud Concentration is Costing Businesses Millions

Share this article
Share this article
Prioritise Us on Google
Raj Vadi, Senior Solutions Architect at Corero Network Security, shares his cybersecurity insights after a spate of high-profile attacks
Outages at Cloudflare, AWS and Azure expose the hidden dangers

The end of 2025 was a stark reminder about digital infrastructure fragility. 

Cloudflare suffered two major outages in less than three weeks disrupting ChatGPT, Spotify, X and thousands of other services. 

Between those incidents, AWS experienced a 14-hour disruption affecting everything from Signal to Starbucks’ mobile ordering system. 

Microsoft Azure also had its own cascade of failures.

Three major cloud providers. Multiple incidents in rapid succession. One unavoidable conclusion: our digital infrastructure has a concentration problem – and it’s costing businesses millions.

The real cost of downtime

When AWS went down in late 2025, the financial impact rippled far beyond Amazon’s own services. 

Banks couldn’t process transactions. Healthcare providers lost access to patient systems. Retailers watched revenue evaporate as checkout systems froze. One analysis suggested the outage cost businesses more than US$1bn in lost productivity and revenue in a single day.

Youtube Placeholder

But the real damage runs deeper than immediate losses. Customer trust erodes with each outage. SLA penalties stack up. Board-level conversations shift from “should we be cloud-first?” to “are we too dependent on a single provider?”

Raj Vadi, Senior Solutions Architect at Corero Network Security, says: “Even with multi-cloud strategies, organisations often centralise security and DNS with one vendor.

“You might have compute spread across AWS and Azure, but if your DDoS protection, WAF and DNS resolution all depend on Cloudflare or a single provider, you haven’t eliminated the single point of failure – you've just moved it.”

The illusion of redundancy

The pattern across the outages is telling – and in Cloudflare’s case, deeply concerning, as the company experienced two separate global outages within 17 days. 

Most troubling? Cloudflare acknowledged after the second incident that resilience improvements promised following the first outage remained incomplete. 

The very safeguards meant to prevent cascading failures hadn’t been fully implemented when the next incident struck.

AWS failed due to a DNS race condition in DynamoDB that cascaded through dependent services. Azure’s problems originated in configuration changes that bypassed safety checks.

But the real damage runs deeper than immediate losses, Raj says

All technically sophisticated failures. All preventable with proper architectural diversity.

Most enterprises believe they’re protected because they’ve adopted multi-cloud strategies. 

But true resilience requires more than distributing workloads – it demands architectural independence across security layers, connectivity paths and control planes.

The uncomfortable truth is this: if your security architecture routes all traffic through a single scrubbing center or CDN provider, your redundant cloud regions won't save you when that provider stumbles.

Multi-homing: The insurance policy most companies overlook

Network engineers understand a fundamental principle: never rely on a single path for critical connectivity. 

However, organizations routinely ignore this wisdom when it comes to security architecture.

Multi-homing – maintaining multiple independent network connections with separate transit providers – creates genuine resilience. When one path fails, traffic automatically reroutes through alternative connections without service disruption.

But multi-homing only works if your security layer can intelligently redirect traffic across those diverse paths. Of course, there are also the costs associated with a multi-homed architecture – however, it is the cost of being able to stay resilient. 

Youtube Placeholder

“We’re seeing service providers take a fundamentally different approach,” Raj explains.

“Rather than forcing all traffic through cloud-based scrubbing centres, they’re deploying on-premises DDoS protection with high availability configurations. 

“When a data centre fails – whether from power outages, fiber cuts or software bugs – protection remains active across other sites without manual intervention.”

This architectural approach transforms how organisations think about resilience.

Instead of passive failover that activates only after detecting failure, active protection operates continuously across multiple locations. 

The hidden dependency: DNS as the weakest link

Both the AWS and Cloudflare outages in late 2025 exposed a critical vulnerability that few organisations adequately address: DNS dependency. 

Modern security architectures compound this risk. Cloud-based security often relies on DNS redirection to route traffic through scrubbing centers or CDN networks. If that DNS layer fails, security breaks – even if underlying infrastructure remains healthy.

Organisations serious about resilience need independent DNS resolution paths and security controls that don’t collapse when a single provider’s DNS infrastructure encounters problems.

What genuine resilience looks like

Architectural resilience requires rethinking how security, connectivity and control planes interact. Several insights emerge from examining the 2025 failures:

Diverse transit connectivity

Multiple upstream providers create genuine path diversity. 

When one ISP experiences routing problems or capacity constraints, traffic flows through alternatives without disruption.

Distributed security enforcement

On-premises DDoS protection and traffic inspection eliminate single-vendor dependencies while maintaining consistent policy enforcement across all paths.

Geographic distribution with active protection

Rather than passive failover between data centres, security should operate continuously across all locations. 

If one site goes offline, protection remains fully operational elsewhere.

Independent DNS and routing

Critical services need DNS resolution that doesn’t depend on a single cloud provider's infrastructure. Enterprise-managed resolvers or multiple DNS providers to reduce concentration risk.

“The question isn’t whether cloud services will experience outages – it’s how quickly your architecture adapts when they do,” Raj says. 

“True resilience means protection doesn’t stop when infrastructure fails. It means customers never notice the difference.”

Beyond the cloud-first orthodoxy

For the past decade, “cloud-first” has been gospel in enterprise IT. 

The late-2025 cascade of outages from hyperscale providers is forcing a reassessment – not because cloud services lack value, but because blind dependence on any single architectural approach creates systemic risk.

Raj says recent outages have made the industry question how much critical infrastructure depends on a single vendors' DNS

The most resilient architectures combine cloud flexibility with on-premises control. Compute workloads might run in public clouds for elasticity and cost efficiency. 

But critical security enforcement, DNS resolution and network control often benefit from remaining under direct enterprise management.

The outages forced uncomfortable questions for technology leaders: How much of your critical infrastructure depends on a single vendor’s DNS? 

If your primary cloud provider experiences an extended outage, can you actually fail over to alternatives? Do your security controls operate independently of the systems they're protecting?

Perhaps most unsettling: if a provider suffers a major outage and promises improvements, how confident can you be that those changes will be implemented before the next incident?

Cloudflare’s December outage arrived before the company had completed resilience work promised after the November incident. 

This raises a fundamental question about vendor dependency: when critical infrastructure fails repeatedly, waiting for a provider to fix their processes isn’t a resilience strategy – it's hope masquerading as planning.

Many organisations discovered during the incidents that their carefully planned redundancy strategies had hidden dependencies and that backup systems rely on the same underlying infrastructure as primary systems. Security controls also fail when the services they protect go offline. 

Building resilience that actually works

True resilience isn’t about eliminating all possible failures, it’s about ensuring that individual failures don’t cascade into systemic collapse.

This requires architectural diversity: multiple vendors, multiple network paths, distributed control planes and security that operates independently of the infrastructure it protects. It means accepting that resilience has costs – both financial and operational. 

The alternative – watching helplessly as a single vendors’ outage takes down your entire operation – is far more expensive.

The cloud isn’t going away, but blind faith in any single architectural approach is giving way to more nuanced thinking. 

The most resilient enterprises are those that combine cloud flexibility with on-premises control, multiple vendors with consistent policy enforcement and global distribution with local autonomy.

When the next major outage inevitably arrives – and it will – these organisations will barely notice. Their customers will keep working, their services will stay online and their boards won’t be asking why everything broke at once.

That’s not redundancy. That’s resilience.