Global IT Outage: CrowdStrike Falcon “Bug” to Blame

By Amber Jackson

July 25, 2024

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

CrowdStrike has said it can work to prevent similar incidents from happening again

The cybersecurity firm vows to improve its software testing after a faulty Windows update caused a global IT outage and impacted essential services

A post-incident review of the July 2024 global IT outage has been conducted by CrowdStrike, finding that the incident occurred because of a “bug” in the system.

The system was meant to check software updates were working properly across Windows endpoints. A glitch meant the system did not identify “problematic content data” in a file, according to CrowdStrike, meaning that computers running Microsoft’s Windows operating system crashed and showed the now-infamous ‘Blue Screen of Death’.

CrowdStrike’s Falcon Sensor has been cited as the cause. The cybersecurity platform is designed to protect systems from malicious activity and threat actors, but contained a fault that ultimately led to the global incident.

The faulty update will cost US Fortune 500 companies roughly US$5.4bn, according to research by Parametrix.

The story so far

Falcon, CrowdStrike’s offering, has previously been listed as one of the leading cybersecurity platforms in the world as it works to detect threats and prevent data breaches.

It was a pioneering multi-tenant, cloud-native intelligent security solution capable of safeguarding workloads across a range of virtualised and cloud-based environments across a range of endpoints.

CrowdStrike Falcon (Image: CrowdStrike)

In response to the global IT outage, CrowdStrike has said it can work to prevent similar incidents from happening again with better software testing and quality control checking. Already, the company announced via a statement that it plans to “implement a staggered deployment strategy” for similar changes.

“The issue on Friday involved a Rapid Response Content update with an undetected error,” the report reads. “When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash.”

Slow recovery for essential services

The world witnessed significant consequences of the significant outage and has since been dubbed the largest IT outage in history, with businesses feeling the sting.

As a result of the bug, more than 8.5 million computers running on Windows software were impacted, leading to hospital cancellations, surges in emergency calls and airport delays as a result of thousands of flight cancellations.

The significance of such an outage cannot be understated. CrowdStrike had what’s known as ‘privileged access’ to Windows as a cybersecurity vendor, meaning it had the ability to install updates into customers’ computers.

Airlines, hospitals and banks are said to be the worst affected

“Thousands of our team members have been working 24/7 to get our customer systems fully restored,” Chief Security Officer at CrowdStrike, Shawn Henry, shared to LinkedIn in the immediate aftermath of the outage.

“The days have been long and the nights have been short, and that will continue for the immediate future. That is part of the promise we made to all of you when you put your trust and protection in our hands.”

Shawn Henry, Chief Security Officer of CrowdStrike (Image: CrowdStrike)

The total cost of the incident could surpass US$1bn, according to a CNN report. Businesses around the world will no doubt still be feeling the overwhelming impact, as system blackouts led to disruptions across a broad range of industries like healthcare and retail, costing insurmountable amounts of time, productivity and perhaps even risk to life.

Likewise, the incident raised the threat level of scams, with a sharp increase in phishing scams in the wake of the outage prompting CrowdStrike Intelligence to issue warnings.

Continued calls for cyber resilience

Having identified and deployed a fix, CrowdStrike has now turned its attention to restoring customer systems.

CrowdStrike CEO George Kurtz expanded upon his earlier statement saying: “I want to sincerely apologise directly to all of you for the outage. All of CrowdStrike understands the gravity and impact of the situation.

“Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike. As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”

George Kurtz, CEO of Crowdstrike (Image: CrowdStrike)

The cybersecurity firm states that it will ensure the prevention of an incident of this scale from happening again by software resiliency and testing.

The CrowdStrike report highlights that it aims to improve rapid response content by testing types such as:

Local developer testing
Content update and rollback testing
Stress testing, fuzzing and fault injection
Stability testing
Content interface testing
Enhance existing error handling in the Content Interpreter
Rapid Response Content Deployment

CrowdStrike states it will also impose additional validation checks, in addition to new checks, to guard against this type of bug from being deployed in the future. It aims to improve monitoring for both sensor and system performance and provide its customers with greater control over the delivery of its Rapid Response Content updates.

A business and its cybersecurity vendor is a sacred trust. In order to ensure resiliency, organisations must always keep up-to-date with the latest software and strategies to confront inevitable breaches or cyberattacks.

In response to this incident, The World Economic Forum highlights a need to shift our perception of cybersecurity from an IT issue to a broader concept of cyber resilience.

“In the face of a cyberattack, businesses should be able to recover fast from an incident and resume business as usual,” the organisation states.

“To be cyber resilient, organisations need to first and foremost identify business-critical processes and ensure the continuity of those even during cyber incidents. This has to involve continuous conversations with business leadership to ensure alignment with the overall business strategy while conducting real-time prioritisation.”

******

Make sure you check out the latest edition of Technology Magazine and also sign up to our global conference series - Tech & AI LIVE 2024

******

Technology Magazine is a BizClik brand

Global IT Outage: CrowdStrike Falcon “Bug” to Blame

The story so far

Slow recovery for essential services

Continued calls for cyber resilience

Tags