Global IT Outage: Cyber Resilience Strategies are Essential
Matt Aldridge
Principal Solutions Consultant at Opentext Cybersecurity
“The global IT outage crisis is a big wake-up call for the whole industry, including for vendors as well as for their customers and the consumers and beneficiaries of technical solutions worldwide.
“The complexity of modern software solutions can be very easily underestimated, and cybersecurity solutions need to be particularly advanced, in order to detect, defend against, and stay ahead of the constantly evolving advanced threats which they face off against on a daily basis.
“Despite comprehensive testing, there is always a chance that a bug may slip through and make it into production code. It appears that this has happened in this case, and any organisation that is using the affected software and which is not controlling the release of updates themselves could be exposed to this issue.
“For any organisation that delivers critical services, it has always been vital to have plans in place to allow operations to continue in the face of an IT outage, however it is a sign of the times that IT systems have become so dependable that they are increasingly taken for granted, and effective plans are not in place to handle outages in these critical systems.”
Danny Jenkins
CEO & Co-Founder of ThreatLocker
“Current IT infrastructure is overly reliant on just a few key vendors, so if one of those vendors releases a faulty update or even worse becomes the victim of a cybersecurity event, it could cascade into an event like this. This incident highlights the need for “Update Channels” - basically having a few machines in a company update before everyone else to test out new updates and mitigate the risk of a total outage.
“This also reiterates the need for a robust cybersecurity strategy. This issue happened from a simple mistake — the damage could have been much greater if a malicious actor was behind the outage.
“Cybercriminals never stop, even during an outage. This incident was perfect timing for cybercriminals - IT professionals may be extremely busy fixing the outage, leaving their guard down to other cybersecurity breaches. If a vulnerability is found within one system, this could be exploited and lead to several knock-on attacks.
“A Zero Trust approach to your environment can help mitigate unforeseen issues and make the business more digitally resilient. Backup your current systems, have a disaster recovery plan, and test it! Many companies have a disaster recovery plan that doesn’t work when they try to use it, businesses cannot take this risk.”
Eileen Haggerty
Area Vice President, Product & Solutions at NETSCOUT
“Hospitals and healthcare treatment providers have been affected with several major hospitals cancelling non-urgent surgeries and others announcing they could still accept appointments but could not connect to patient records, instead having to rely on paper records.
“Implementing system updates effectively requires carrying out preventive maintenance and routine upgrades to ensure services can operate at optimal efficiency. By carrying out maintenance checks and regular updates, organisations can mitigate the risk of unexpected downtime and, in turn, prevent fiscal and reputational losses. To avoid downtime resulting from system outages, as well as the chaos and performance disruption that accompanies it, organisations’ IT teams need complete end-to-end visibility into the threats against their network. This allows organisations to monitor networks and applications regardless of where they are hosted or where users access them.
“Looking ahead, as a way of learning from Friday’s global IT outage, organisations should use visibility tools for post-mortem, allowing them to build a detailed repository of information based on previous issues they have encountered, helping them to deal with future challenges more effectively and efficiently.”
Kory Daniels
Chief Information Security Officer at Trustwave
"The recent outage underscores a growing concern: the potential for widespread disasters, either natural or digital, to serve as catalysts for criminal activity.
“When systems fail and chaos ensues, it creates ideal conditions for criminals to prey on the unique opportunity. History has shown us that these moments of disruption are often accompanied by a surge in criminal behaviour. It's essential to recognise that the digital landscape, like the physical world, is susceptible to unforeseen events, and we must be prepared to defend against criminal acts that may follow.
“To bolster readiness and resilience, organisations must prioritise robust incident response and recovery planning, encompassing scenarios that simulate the unavailability of critical systems and personnel. This requires comprehensive strategies addressing both natural disasters and cyberattacks. Regular testing and simulation exercises are essential to equip teams for effective crisis response.
“Fostering a culture of resilience can heighten overall organisational vigilance and preparedness.”
Ian Cairns
Sales Director at TalkTalk Business
“The widespread-chaos of the recent global IT outage shone a light on our world-wide dependence on digital applications. Taking learnings from the incident is paramount to reduce the likelihood of future disruptions on a much larger scale.
“The outage also highlighted our worldwide vulnerability to larger threats. Whilst the CrowdStrike outage was the result of an unintended error rather than a malicious attack, it acted as a stark demonstration of why our intricately connected world needs to mitigate against such risks.
“Our reliance on the cloud makes learning from the recent outage more pertinent than ever. Remote data storage has many benefits, allowing employees and customers alike to access important business data from all over the world. But this also means that a fault or disruption which interrupts this access can have a significant impact on operations.
“The value of investing in fast, reliable and secure network solutions which work to protect access to business critical data can’t be ignored. Investing in solutions like Security Service Edge (SSE), for instance, can provide businesses with key data-access services like identity verification, security policy enforcement, compliance, and threat protection.
“For organisations that hold important client, employee, or customer data on a cloud-based system investing in constant vigilance to protect cloud data should be a non-negotiable.”
Andy Bridden
Cybersecurity Expert at PA Consulting
“The global outage demonstrates how reliant many organisations have become on digitised services. The incident had extensive cyber impacts. It inadvertently revealed which companies were using Windows 10/11 and CrowdStrike’s Falcon Sensor. Trust issues regarding security patching have arisen and recovery keys became inaccessible, rendering backups unusable.
“Threat actors have exploited the situation, launching phishing campaigns and social engineering attacks. Managing cybersecurity risk remains a significant challenge, with threat actors also looking to target our national infrastructure.
“However, a robust approach to cybersecurity risk management and security architecture can significantly reduce the impact of these types of incidents. Organisations should undertake a high-level cyber risk assessment to understand which services are critical and work out the risks that could significantly impact these or their ability to deliver to customers. Considering the cybersecurity risks associated with widely used components, such as CloudStrike, can help firms develop suitable mitigations.
“For software and security updates, a staged deployment – with the ability to roll back updates – can be a good mitigation measure. Staging the deployment of updates allows any issues not found during verification to be identified in the field and the ability to roll back reduces the risk of services being unavailable.”
Mike Maddison
CEO of NCC Group
“In today's interconnected world, IT outages are an inevitable challenge that can disrupt entire digital supply chains with ripple effects throughout. This incident underscores the urgent need for organisations to prioritise cyber resilience and have robust incident management plans in place.
"The outage highlighted the critical issue of supplier concentration risk. Dependence on a limited number of suppliers, or a single supplier, for essential services can create a significant vulnerability, potentially leading to sector-wide failures. Regulators are increasingly concerned with managing systemic risks across industries.
“Given the significant disruptions caused by IT outages, it is imperative for organisations to prioritise operational resilience. Developing comprehensive crisis management plans and pragmatic risk management strategies can significantly mitigate potential impacts and ensure business continuity, even in unforeseen scenarios.
“In our digital age, collaboration across sectors is essential to address and manage the evolving technological risks we encounter daily. By fostering a culture of shared learning and proactive engagement, we can better prepare for future challenges and enhance the resilience of our global supply chains.”
Bernard Montel
EMEA Technical Director and Security Strategist at Tenable
“This incident makes it crystal clear to all organisations how important cybersecurity programmes are to their business critical applications. They need full visibility of their cybersecurity practices, including business continuity and disaster recovery plans.
“Managing risks in cybersecurity is now the new standard, and this scenario teaches us another kind of risk. Business continuity and disaster recovery plans are also important, as this case has shown. When applying a risk management methodology to drive cybersecurity programmes, we anticipate the chance of “emergencies” and have some documented tasks to apply based on those defined scenarios. Full visibility of assets and active and continuous attack surface management is key to being able to react efficiently and quickly limit risk.
“Organisations need full visibility of assets and should keep a full software inventory for when glitches inevitably occur. It’s critical that we diversify the IT platforms we use so that we don’t put all our eggs in one basket.
“Businesses should also routinely assess the processes of third party security vendors. It’s critical to maintain and pressure test a clear recovery plan. Whatever the reason, be it outage or cyberattack, the risk is similar and the answer should be the same; resilience, incident response and crisis management.”
Matt Williamson
SVP & Industry Principal at Endava
“In 2024, it's evident that global infrastructures are not adequately prepared for future challenges.
“In the UK alone, GP surgeries have reported an inability to access patient records or book appointments. Sky News went off-air for a few hours but resumed broadcasting, while Britain’s biggest train company warned passengers of disruptions due to widespread IT issues. Globally, banks, supermarkets, and other major institutions have faced computer issues disrupting services with some airlines warning of delays, and certain airports grounding flights.
“These events should serve as a warning to companies searching for the 'golden bullet'—technology that promises to enhance their offerings and modernise their customer experience. Without the correct foundation, implementing promising tech can weaken your system, instead of strengthening it.
“The first step in improving cybersecurity is implementing sophisticated threat detection and response systems, such as AI-driven solutions, which can assist in quickly identifying and eliminating such threats. Regular security audits, penetration testing and multi-factor authentication are all methods to further bolster defences.
“Secondly, preparedness for power outages is critical. Integrating cloud-based solutions for data storage and operations adds another layer of resilience, allowing for quick recovery and data redundancy.”
Adam Smart
Director of Product - Gaming at AppsFlyer
“While we’ve seen impacts across so many industries, this outage came with significant implications for mobile marketers and their user acquisition campaigns. When apps go down, the user experience takes a direct hit, tarnishing the app's reputation and often leading to user abandonment.
“Every minute an app is down translates to lost revenue, user churn and wasted advertising spend. The uncertainty of how long these outages could take to fix leaves advertisers facing a dilemma: do you continue running campaigns that direct users to a non-functional app or do you halt these campaigns altogether. This is particularly challenging on platforms where stopping or pausing campaigns can disrupt historical performance data, leading to higher costs and reduced effectiveness once campaigns resume.
“This outage serves as a stark reminder of the importance of robust contingency planning and transparent communication with users. By understanding and mitigating the impact of these disruptions, mobile marketers can better navigate the challenges posed by unforeseen technical issues.”
Rajat Bhargava
CEO of JumpCloud
This global outage is a watershed moment for IT. Things will get better - we will build more resilient systems, and teams will think differently. We have to. The world is more reliant on technology and more interconnected than ever - there isn't another option.
CrowdStrike and Microsoft are excellent companies, run by amazingly talented people. The truth is that these events can and do happen. But July’s events are at a radically different scale. JumpCloud, too, has had its past issues and our customers and partners have been amazing supporters as we have worked to resolve and correct defects. When critical IT and security infrastructure and tooling hiccup, the effects are dramatic. It’s the responsibility we bear every day.
As IT professionals, we must revise our strategies around IT systems and update our playbooks–playbooks that will take our organisations to the next level, where these types of events will be painful, but not catastrophic.
Give IT and Security seats at the C-suite table. Now is the time to ensure that every organisation is focused on digital experience, security, and reliability. No organisation can run without those now. Elevate these roles.
Technology is not foolproof, and these types of incidents can happen to any vendor. If we switch the perspective of this incident, however, from “accident" to “cyberattack", will that be the appropriate wake-up call for leaders to rethink and adjust their plans? We cannot take that risk and wait for that to occur.
Steve Ponting
Director at Software AG
“The outage exposes the reality that today’s interconnected technology ecosystems which power corporate IT infrastructures are highly vulnerable to cascading failures.
“Many companies will find it difficult to understand the full scale of this incident – at least in the short term – because there is too much IT and Operational chaos in their organisation, which clouds the clear picture needed to be decisive in these high-stakes situations.
“The outage serves as an urgent reminder of the need for robust digital defences and operational resilience programmes to mitigate the risks of shutdown.
“Process Intelligence helps you understand behaviours or common practices that elevate risk. In this case, behaviours that increase the risk of IT failures – for example where employees deviate from standard processes by using non-approved applications for work.
“Businesses should also quickly identify critical and affected systems in any disaster recovery plan. It’s essential for any organisation to build an experienced incident response team, not only with clearly defined roles and responsibilities, but the ability to identify those systems affected. Unfortunately, disaster often strikes in multiple places at once, so being able to identify those critical systems in disparate systems, while preventing downtime in others will ensure business continuity.”
Ranjan Singh
Chief Product Officer at Kaseya
“While solution vendors certainly do their best when pushing out updates, widely deployed and trusted software solutions still run the risk of defective code, as in this case, or other bad code which may cause havoc.
“This catastrophe illustrates the challenge of widely deployed software without IT controls and the critical needs for a rock-solid backup and recovery plan to ensure resilience against cyberattack, unintentional buggy code and just about anything else.
“The fix, for many, won’t be easy. In some cases, the machine may automatically get the update before the software crashes the system. The recommended workaround provided by CrowdStrike requires, in many cases, physical access to a machine. In other scenarios, recovery is complicated by additional security layers or lack of admin rights. Unfortunately, this will mean many long days for IT admins.”
Mat Westergreen-Thorne
CEO of Grantify
“We'll likely see a surge in demand for more comprehensive cyber insurance that covers software update vulnerabilities in addition to attacks, especially from those in the financial sector, where compensation will be top of mind.
“I imagine there will also be a renewed focus on robust backup systems and failsafes, with stricter protocols for regular data updates becoming the norm and possible staggered update procedures to ensure operational continuity.
“SMEs and start-ups, should take note of this when planning their risk management structures and ensure that they have contingencies in place to continue BAU, if something like this were to happen again.
“This incident has highlighted the interconnectedness of our digital ecosystem and should push businesses to conduct more thorough due diligence on their technology partners. While challenging, this situation will drive innovation in cybersecurity and risk management, leading to more resilient business models and new opportunities in these sectors.
“Ultimately, we envisage that we'll see a more secure, albeit more cautious, business environment emerge from this wake-up call.”
Jamil Ahmed
Director (Solution Engineering UK&I) at Solace
“The reason the outage was so widespread is due to how ubiquitous the Windows operating system is across various industries. Airlines use it for check-in desks, retail use it for point-of-sales machines - and more.
“Organisations with a modern IT stack, leveraging approaches such as APIs and Event-Driven Architecture, will bounce back the earliest. APIs allow for different ways of accessing the server-side backends, so that if Windows-based screens are inoperable, contingencies around bringing up different IT channels (such as mobile devices) can be used.
“For those with event-driven architecture, this pattern already assumes that everything is decoupled and not necessarily up and running at the same time. In other words, server-side activity such as rebooking passenger flights can be taking place, even if the screens at the airport desks cannot show it. All this underscores our dependency on digital infrastructure. Those that have invested in it will see a return today of bouncing back faster than peers that did not.”
Haris Pylarinos
CEO at Hack the Box
“This has cast a spotlight on our dependency on centralised digital systems. The effects underscore how even a short-term interruption can escalate into substantial operational setbacks. Our reliance on deeply integrated software demands more robust testing environments to catch potential issues before they affect live systems.
“Historically, bad patches causing reboot cycles are not new. However, what's changed in recent years is the widespread use of disk encryption. Previously, system recovery could be handled through methods like PXE/Network Boot without requiring physical access. Disk encryption has complicated this process, as it requires entering a decryption key to access and repair affected systems.
“To remain resilient, businesses must integrate preparation for large-scale disruptions into their broader risk management frameworks. This involves not only having a response plan, but regularly practising it through drills and crisis simulations.
“In addition, hackers are now exploiting this widespread confusion. During these periods, employees are more susceptible to social engineering tactics and phishing attacks, as they may be seeking quick fixes or assistance. It is vital to invest in ongoing security awareness training for all employees to ensure ongoing protection across every layer of the business.”
**************
Make sure you check out the latest edition of Technology Magazine and also sign up to our global conference series - Tech & AI LIVE 2024
**************
Technology Magazine is a BizClik brand