Dark data: How AI and ML can solve unclassified data issues

September 01, 2023

undefined mins

AI and machine learning can be used to tackle data problems, allowing light to be shone on organisations’ dark data

Today’s world has seen a massive explosion in data. While only three of the 10 most valuable enterprises were actively taking a data-driven approach in 2008, that number has risen to seven out of 10 today.

According to research by Accenture, every day, the world produces five exabytes of data. By 2025, this is set to rise to a rate of 463 exabytes per day. But with the ever-increasing quantities of data, arriving in increasing volumes and with more velocity, organisations increasingly face the challenge of dealing with dark data - defined by Gartner as the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes.

This month Technology Magazine hears from Ian Wood, UK Head of Technology at Veritas, who spoke at Tech LIVE Virtual on the dangers of dark data and how organisations can shed light to gain visibility.

“Dark data simply means data that has not been classified,” Wood explains. “We're not saying it's bad or good data. We're not saying it's valuable data, or that it's invaluable. Frankly, we are saying we don't know what it is. Lurking in dark data could be the most sensitive or critical data to any organisation, due to the fact that they haven't got effective classification, and an effective means to monitor and understand their data.”

AI and dark data

AI augments our everyday lives in increasingly varied ways. Using machine learning, AI enables us to automate tasks, all powered by big data.

“I've not done a digitalisation presentation in the last 10 years where we have not used the phrase ‘data is the oil of the 21st century’,” Woods describes. “It is ultimately the fuel that's fueling the fourth industrial revolution that we live in today. Peter Sondergaard coined that from Gartner quite some time ago. And data is absolutely the fuel to AI and data is the fuel to digitalisation in everyday view.

“We have equated data to oil, and looking at the current oil price, the price of data is pretty valuable and therefore maybe that's why so many of us store data indefinitely.”

However, the challenge is that data in an organisation can easily become cluttered. “As humans, we are data hoarders. We hang on to all of our data. We hang on to perhaps my tax return that I did in the early 2000s, because we think it'll have some value in the future. We see organisations hanging on to the menu that they had in the canteen in 2001 probably has no real value to that because they just think that the more we store, the more we can mine, the more value you can have.”

Dark data: sustainability and data privacy challenges

Dark data poses a range of challenges to organisations across the world. In the UK alone, 6.4 million tonnes of CO2 is attributed to dark data - representing around about 1.5% of the region’s total emissions.

“Simply having all this data stored and run from a day to day life is wasting a very, very valuable energy resource that is not only important to the climate today, but also quite costly as well,” Woods asserts. “It has a cost implication, and it has a social implication to making sure that we're more effective and responsible in storing data.”

And with data privacy becoming increasingly important, organisations must be careful to avoid falling foul of regulators.

“We’ve increasingly seen that regulators are starting to gain teeth. Many organisations are experiencing fines if they've mistreated data and they don't adhere to the GDPR or data privacy regulations around the world. You can imagine that if an organisation has a lot of dark data, what you can't see, you can't manage. So there's a real implication to the risk of data treating it responsibly.”

Clearly, organisations need to be able to get a better view of where their data is. AI and machine learning can be used to make the whole process more effective, through enabling better data discovery.

“Instead of just using AI to win a chess game or predict when my family may need new toothpaste, we can point out AI and machine learning at solving data problems so we can use this to manage dark data,” Woods asserts. “Ultimately, data discovery is one of the first areas in which we can use ML/AI to help discover data assets and then to put it through an engine to help classify.

“When data is cluttered, it's frankly very difficult to make data decisions. So in this instance, we are using AI within Veritas to discover data sources. We are then using that to put it through a classification engine to predict what data resides in an organisation and ultimately classify where it resides, who has ownership, what the type of data files are, perhaps even identify where malware can reside inside files and data, and ultimately shine a light on dark data so that organisations can make data decisions.”

Dark Data AI Fintech Investment