Data poisoning threatens to choke AI and machine learning

By George Hopkin
Hackers are infecting everyday AI technologies like autocomplete, chatbots and spam filters with poisoned data designed to turn them against consumers

Artificial intelligence (AI) may be opening up new opportunities and markets for businesses of all sizes, but for a disparate group of hackers, this has provided the opportunity to deceive machine learning (ML) systems through a process called data poisoning.

And these attacks are being carried out unnoticed every day, say experts, and this is not only losing potential income for businesses, but it is also infecting machine learning systems that go on to reinfect those ML models that rely on user input for ongoing training.

McKinsey puts a US$10 trillion–US$15 trillion value on the potential global impact of AI-ML technologies and says early leaders in the field are already seeing 250% increase in five-year total shareholder returns. But when McKinsey asked more than 1,000 executives about their digital transformation work, 72% of organisations surveyed said they have not successfully scaled.

Even hackers just starting out on their dark arts find data poisoning attacks relatively easy to perform because creating “polluted” data can often be done without any great knowledge of the system to be influenced. Manipulating autocomplete to influence product reviews and political disinformation campaigns occur every day.

Data poisoning attacks could lower reliability of ML services

Attacks against machine learning are generally seen to focus on two elements: the information the attacker possesses and the timing of the attack, explains recent research carried out by Eoin Wickens, Marta Janus, and Tom Bonner of HiddenLayer, a provider of security solutions for ML algorithms, models and data.

Attackers can perform data poisoning by modifying entries in the existing dataset or injecting the dataset with doctored data, which can be easier to feed into those online ML-based services which are continually re-trained with user-provided input.

Sometimes the hacker will simply want to lower the overall reliability of the machine learning model, perhaps to achieve the opposite decision to a check the ML model was designed to produce. In more targeted attacks the target may be a more specific false result, while maintaining accuracy for others, and these can go unnoticed for a significant amount of time.

Technologies including autocomplete, chatbots, spam filters, intrusion detection systems, financial fraud prevention and even medical diagnostic tools are all susceptible to data poisoning attacks as they make use of online training or continuous-learning models.

Hackers and bad actors may aim to confuse the system with carefully crafted bad data in order to add “backdoor” behaviours, explains Chris Anley, Chief Scientist, NCC Group, in his recent paper Practical Attacks on Machine Learning Systems

“For example, a facial recognition system used for authentication might be manipulated to permit anyone wearing a specific pair of glasses to be classified as the user ‘Bob’, while under other circumstances the system behaves normally,” explains Anley. “

And action needs to be taken, says Anley, as there is a growing body of evidence highlighting issues which must be addressed. Sensitive data used to train a system can often be recovered by attackers and used against the system, says Anley, and neural network classifiers can be “brittle” as they can be forced to missclassify data. Existing countermeasures can reduce accuracy and even open the door to other attacks, he adds. And remote hackers can extract high-fidelity copies of the trained ML model, giving them a tame example to observe and learn from for future attacks.

“While exploiting these issues is not always possible due to various mitigations that may be in place, these new forms of attack have been demonstrated and are certainly viable in practical scenarios,” says Anley.

Share

Featured Articles

Unlocking 5G’s potential with network slicing

As communication service providers look to find new revenue opportunities, we assess how network slicing could be the key to unlocking its potential

Global tech teams rewarded by post-pandemic performance

Businesses are reaping the rewards of digital transformation introduced after the Covid-19 lockdowns, according to a new KPMG report, but key issues remain

Oracle NetSuite’s SuiteWorld 2022 - highlights from Day 2

Live from SuiteWorld 2022 - Technology Magazine shares the highlights from the second day of Oracle NetSuite’s iconic event, in Las Vegas’ Caesar’s Forum

Sustainability and the CIO: how tech can save the world

Digital Transformation

Car companies in the cloud speed up data processing with AWS

Cloud & Cybersecurity

Europe ‘missed the technology revolution boat’ report says

Digital Transformation