Data poisoning threatens to choke AI and machine learning

By George Hopkin

September 07, 2022

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Hackers are infecting everyday AI technologies like autocomplete, chatbots and spam filters with poisoned data designed to turn them against consumers

Artificial intelligence (AI) may be opening up new opportunities and markets for businesses of all sizes, but for a disparate group of hackers, this has provided the opportunity to deceive machine learning (ML) systems through a process called data poisoning.

And these attacks are being carried out unnoticed every day, say experts, and this is not only losing potential income for businesses, but it is also infecting machine learning systems that go on to reinfect those ML models that rely on user input for ongoing training.

McKinsey puts a US$10 trillion–US$15 trillion value on the potential global impact of AI-ML technologies and says early leaders in the field are already seeing 250% increase in five-year total shareholder returns. But when McKinsey asked more than 1,000 executives about their digital transformation work, 72% of organisations surveyed said they have not successfully scaled.

Even hackers just starting out on their dark arts find data poisoning attacks relatively easy to perform because creating “polluted” data can often be done without any great knowledge of the system to be influenced. Manipulating autocomplete to influence product reviews and political disinformation campaigns occur every day.

Data poisoning attacks could lower reliability of ML services

Attacks against machine learning are generally seen to focus on two elements: the information the attacker possesses and the timing of the attack, explains recent research carried out by Eoin Wickens, Marta Janus, and Tom Bonner of HiddenLayer, a provider of security solutions for ML algorithms, models and data.

Attackers can perform data poisoning by modifying entries in the existing dataset or injecting the dataset with doctored data, which can be easier to feed into those online ML-based services which are continually re-trained with user-provided input.

Sometimes the hacker will simply want to lower the overall reliability of the machine learning model, perhaps to achieve the opposite decision to a check the ML model was designed to produce. In more targeted attacks the target may be a more specific false result, while maintaining accuracy for others, and these can go unnoticed for a significant amount of time.

Technologies including autocomplete, chatbots, spam filters, intrusion detection systems, financial fraud prevention and even medical diagnostic tools are all susceptible to data poisoning attacks as they make use of online training or continuous-learning models.

Hackers and bad actors may aim to confuse the system with carefully crafted bad data in order to add “backdoor” behaviours, explains Chris Anley, Chief Scientist, NCC Group, in his recent paper Practical Attacks on Machine Learning Systems.

“For example, a facial recognition system used for authentication might be manipulated to permit anyone wearing a specific pair of glasses to be classified as the user ‘Bob’, while under other circumstances the system behaves normally,” explains Anley. “

And action needs to be taken, says Anley, as there is a growing body of evidence highlighting issues which must be addressed. Sensitive data used to train a system can often be recovered by attackers and used against the system, says Anley, and neural network classifiers can be “brittle” as they can be forced to missclassify data. Existing countermeasures can reduce accuracy and even open the door to other attacks, he adds. And remote hackers can extract high-fidelity copies of the trained ML model, giving them a tame example to observe and learn from for future attacks.

“While exploiting these issues is not always possible due to various mitigations that may be in place, these new forms of attack have been demonstrated and are certainly viable in practical scenarios,” says Anley.

Data poisoning threatens to choke AI and machine learning

Data poisoning attacks could lower reliability of ML services

Tags