How AWSâ Outage Exposes the Risks of Cloud Dependency

AWS’ outage hit headlines around the world on Monday, with millions of people around the world affected by downtime across apps from Zoom to Slack and monday.com to Duolingo.
Following Monday’s massive cloud disruption, AWS has completed its root-cause analysis, confirming that an internal automation fault triggered a cascade of DNS failures in its US-East-1 region.
While full service was restored within hours, the aftermath was catastrophic – reigniting debate over resilience strategies, multi-cloud architecture and the sheer dependence global businesses have on Amazon’s infrastructure.
AWS outage: What went wrong?
AWS says the issue originated from an error in a configuration automation process that prevented domain names from resolving properly to IP addresses within DynamoDB, one of its core data services.
This DNS failure disrupted connections across more than 1,000 interconnected sites globally, with Lloyds Bank and Venmo among those affected.
According to Amazon’s post-event summary, the fault appeared after a routine update and “caused a backlog of messages that took several hours to process”.
The incident highlights how a single regional glitch in AWS’ oldest and busiest data hub can ricochet across industries – freezing transactions, blocking communication tools and taking streaming and shopping platforms offline.
“We apologise for the impact this event caused our customers,” Amazon’s statement says.
“We know how critical our services are to our customers, their applications and end users and their businesses. We know this event impacted many customers in significant ways.”
Industry responds: Lessons in resilience to be taken from AWS outage
For cloud leaders and engineers, the outage has served as yet another reminder that hyperscale doesn’t mean infallible.
Jamil Ahmed, Distinguished Engineer at Solace, says: “Even as cloud technology evolves, failures within the system will inevitably happen.
“'One-of-a-kind', extremely rare outages or issues continue to plague every service provider from time to time, which is why the need to store valuable information on multiple provider services, known as an event mesh, have arisen... It is now ‘later on’ and the strategy of using one cloud service is demonstrably dangerous and negligent.”
Cybersecurity experts also warn of the broader risks that follow infrastructure failures.
ChristianâŻEspinosa of Blue Goat Cyber adds: “This widespread outage is a stark reminder that even massive infrastructure providers are not immune to cascading failures.
âWhat makes it more dangerous for businesses is how these disruptions magnify cyber-risk. When platforms go dark, organisations inadvertently shift into backup systems, remote tools are stressed and control lapses become exploitable.â
Analysts at Ookla recorded more than 17 million outage reports globally within the first few hours, the majority from US-based users connected to AWSâ East Coast infrastructure.
According to estimates from Deployflow, enterprise downtime during this incident cost between US$5,000 and US$9,000 per minute.â
Jake Madders, Director and Co-Founder at Hyve Managed Hosting, shares how organisations can dodge similar risks.
“Even the largest and most reliable cloud providers can experience significant outages – but these risks can be mitigated,” he says.
“The key lies in building resilience into your infrastructure from the outset. Diversifying across multiple cloud providers and geographic regions is essential to ensure redundancy and enable seamless failover when disruption occurs.”
Rob vanâŻLubek, EMEA Vice President at Dynatrace, adds: “Global incidents like this are a clear reminder of how dependent our world has become on software and digital systems.
âThe difference between disruption and recovery often comes down to visibility and speed â how fast an organisation can pinpoint whatâs gone wrong, understand why and act to restore service continuity.â




