Why Cloud Concentration is Costing Businesses Millions

The end of 2025 was a stark reminder about digital infrastructure fragility.
Cloudflare suffered two major outages in less than three weeks disrupting ChatGPT, Spotify, X and thousands of other services.
Between those incidents, AWS experienced a 14-hour disruption affecting everything from Signal to Starbucks’ mobile ordering system.
Microsoft Azure also had its own cascade of failures.
Three major cloud providers. Multiple incidents in rapid succession. One unavoidable conclusion: our digital infrastructure has a concentration problem – and it’s costing businesses millions.
The real cost of downtime
When AWS went down in late 2025, the financial impact rippled far beyond Amazon’s own services.
Banks couldn’t process transactions. Healthcare providers lost access to patient systems. Retailers watched revenue evaporate as checkout systems froze. One analysis suggested the outage cost businesses more than US$1bn in lost productivity and revenue in a single day.
But the real damage runs deeper than immediate losses. Customer trust erodes with each outage. SLA penalties stack up. Board-level conversations shift from âshould we be cloud-first?â to âare we too dependent on a single provider?â
Raj Vadi, Senior Solutions Architect at Corero Network Security, says: âEven with multi-cloud strategies, organisations often centralise security and DNS with one vendor.
âYou might have compute spread across AWS and Azure, but if your DDoS protection, WAF and DNS resolution all depend on Cloudflare or a single provider, you havenât eliminated the single point of failure â you've just moved it.â
The illusion of redundancy
The pattern across the outages is telling â and in Cloudflareâs case, deeply concerning, as the company experienced two separate global outages within 17 days.
Most troubling? Cloudflare acknowledged after the second incident that resilience improvements promised following the first outage remained incomplete.
The very safeguards meant to prevent cascading failures hadnât been fully implemented when the next incident struck.
AWS failed due to a DNS race condition in DynamoDB that cascaded through dependent services. Azureâs problems originated in configuration changes that bypassed safety checks.
All technically sophisticated failures. All preventable with proper architectural diversity.
Most enterprises believe they’re protected because they’ve adopted multi-cloud strategies.
But true resilience requires more than distributing workloads – it demands architectural independence across security layers, connectivity paths and control planes.
The uncomfortable truth is this: if your security architecture routes all traffic through a single scrubbing center or CDN provider, your redundant cloud regions won't save you when that provider stumbles.
Multi-homing: The insurance policy most companies overlook
Network engineers understand a fundamental principle: never rely on a single path for critical connectivity.
However, organizations routinely ignore this wisdom when it comes to security architecture.
Multi-homing – maintaining multiple independent network connections with separate transit providers – creates genuine resilience. When one path fails, traffic automatically reroutes through alternative connections without service disruption.
But multi-homing only works if your security layer can intelligently redirect traffic across those diverse paths. Of course, there are also the costs associated with a multi-homed architecture – however, it is the cost of being able to stay resilient.
âWeâre seeing service providers take a fundamentally different approach,â Raj explains.
âRather than forcing all traffic through cloud-based scrubbing centres, theyâre deploying on-premises DDoS protection with high availability configurations.
âWhen a data centre fails â whether from power outages, fiber cuts or software bugs â protection remains active across other sites without manual intervention.â
This architectural approach transforms how organisations think about resilience.
Instead of passive failover that activates only after detecting failure, active protection operates continuously across multiple locations.
The hidden dependency: DNS as the weakest link
Both the AWS and Cloudflare outages in late 2025 exposed a critical vulnerability that few organisations adequately address: DNS dependency.
Modern security architectures compound this risk. Cloud-based security often relies on DNS redirection to route traffic through scrubbing centers or CDN networks. If that DNS layer fails, security breaks â even if underlying infrastructure remains healthy.
Organisations serious about resilience need independent DNS resolution paths and security controls that donât collapse when a single providerâs DNS infrastructure encounters problems.
What genuine resilience looks like
Architectural resilience requires rethinking how security, connectivity and control planes interact. Several insights emerge from examining the 2025 failures:
Diverse transit connectivity
Multiple upstream providers create genuine path diversity.
When one ISP experiences routing problems or capacity constraints, traffic flows through alternatives without disruption.
Distributed security enforcement
On-premises DDoS protection and traffic inspection eliminate single-vendor dependencies while maintaining consistent policy enforcement across all paths.
Geographic distribution with active protection
Rather than passive failover between data centres, security should operate continuously across all locations.
If one site goes offline, protection remains fully operational elsewhere.
Independent DNS and routing
Critical services need DNS resolution that doesnât depend on a single cloud provider's infrastructure. Enterprise-managed resolvers or multiple DNS providers to reduce concentration risk.
âThe question isnât whether cloud services will experience outages â itâs how quickly your architecture adapts when they do,â Raj says.
âTrue resilience means protection doesnât stop when infrastructure fails. It means customers never notice the difference.â
Beyond the cloud-first orthodoxy
For the past decade, âcloud-firstâ has been gospel in enterprise IT.
The late-2025 cascade of outages from hyperscale providers is forcing a reassessment â not because cloud services lack value, but because blind dependence on any single architectural approach creates systemic risk.
The most resilient architectures combine cloud flexibility with on-premises control. Compute workloads might run in public clouds for elasticity and cost efficiency.
But critical security enforcement, DNS resolution and network control often benefit from remaining under direct enterprise management.
The outages forced uncomfortable questions for technology leaders: How much of your critical infrastructure depends on a single vendor’s DNS?
If your primary cloud provider experiences an extended outage, can you actually fail over to alternatives? Do your security controls operate independently of the systems they're protecting?
Perhaps most unsettling: if a provider suffers a major outage and promises improvements, how confident can you be that those changes will be implemented before the next incident?
Cloudflare’s December outage arrived before the company had completed resilience work promised after the November incident.
This raises a fundamental question about vendor dependency: when critical infrastructure fails repeatedly, waiting for a provider to fix their processes isn’t a resilience strategy – it's hope masquerading as planning.
Many organisations discovered during the incidents that their carefully planned redundancy strategies had hidden dependencies and that backup systems rely on the same underlying infrastructure as primary systems. Security controls also fail when the services they protect go offline.
Building resilience that actually works
True resilience isn’t about eliminating all possible failures, it’s about ensuring that individual failures don’t cascade into systemic collapse.
This requires architectural diversity: multiple vendors, multiple network paths, distributed control planes and security that operates independently of the infrastructure it protects. It means accepting that resilience has costs – both financial and operational.
The alternative – watching helplessly as a single vendors’ outage takes down your entire operation – is far more expensive.
The cloud isn’t going away, but blind faith in any single architectural approach is giving way to more nuanced thinking.
The most resilient enterprises are those that combine cloud flexibility with on-premises control, multiple vendors with consistent policy enforcement and global distribution with local autonomy.
When the next major outage inevitably arrives – and it will – these organisations will barely notice. Their customers will keep working, their services will stay online and their boards won’t be asking why everything broke at once.
That’s not redundancy. That’s resilience.


