Hackers sabotage hospital systems that use artificial intelligence (AI) to analyze medical images, which causes doctors to misdiagnose diseases. Attackers compromise a large retailer's chatbots and train them to respond to customers with inappropriate or unprofessional language. Malicious actors manipulate the training data used by an AI-driven email filtering system to allow phishing or ransomware activity to go undetected.
These are just a few theoretical examples of the damage that could be done by what many are calling the next big cybersecurity threat: AI data poisoning.
Data poisoning is as scary as it sounds. It's a type of cyberattack in which hackers introduce incorrect or misleading information into the data sets used to train AI and machine learning (ML) models. Since AI is only as good as the data it is trained on, data poisoning can wreak havoc with AI's ability to make accurate predictions, automate processes, and perform other tasks.
"There are many opportunities for bad actors to corrupt this data, both during an AI system's training period and afterward, while the AI continues to refine its behaviors by interacting with the physical world," according to a 2024 report by the National Institute of Standards and Technology (NIST). "These attacks undermine the trustworthiness and reliability of AI systems, especially in sensitive domains like healthcare, autonomous vehicles, and cybersecurity."
"Don't believe everything you read on the Internet," goes the old saying. But as AI transforms life all around us, from the virtual assistants that streamline customer interactions to the systems that analyze traffic flow to reduce congestion in urban areas to the machine learning models that speed drug development by predicting the efficacy of new compounds, too much is at stake to not have confidence in what AI tells us.
Technology at Risk
It's fair to say that data poisoning represents an existential threat to AI. I believe the technology is at risk unless organizations can ensure their AI systems are safe and trustworthy.
Three observations that dive deeper into data poisoning:
1. We've been here before.
Remember the 2020 SolarWinds attack? Who couldn't. The federal government has called it "one of the most widespread and sophisticated hacking campaigns ever conducted against the federal government and private sector." Russian operatives breached the networks at SolarWinds, a Texas-based network management software company, and injected malware later in SolarWinds' Orion software updates downloaded by an estimated 18,000 customers.
SolarWinds is considered the world's most severe supply chain compromise. Attackers did not go after their victims' networks directly but rather exploited the interconnected assets of many companies. The attack imperiled thousands of organizations by impacting one.
You could think of a major AI data poisoning attack as SolarWinds on steroids. That's because AI itself also works as a supply chain, with a vast, interconnected ecosystem of resources, technologies, and infrastructure enabling its development, deployment, and maintenance. Strike any one of these and hackers could endanger untold numbers of AI systems.
"[Large language model] supply chains are susceptible to various vulnerabilities, which can affect the integrity of training data, models, and deployment platforms," says the Open Worldwide Application Security Project (OWASP). "These risks can result in biased outputs, security breaches, or system failures. While traditional software vulnerabilities focus on issues like code flaws and dependencies, in ML the risks also extend to third-party pre-trained models and data."
SolarWinds also introduced into the zeitgeist the notion that today's sophisticated threat actors are willing to play the long game, planting the seeds for attacks that may take months or even years to detect.
We can apply the lessons of SolarWinds to data poisoning by acknowledging that the threat vector is enormous and multifaceted, rather than a single point of vulnerability and by recognizing the patience and discipline many modern attackers have.
2. The chief data officer will become even more vital.
The central role of data in AI, and the associated risks may elevate the chief data officer's (CDO's) responsibilities to a level on par with those of the chief information security officer (CISO).
Until now, the CDO's job has been to lead their organization's policies and practices for data management, ensure data quality and integrity, and comply with regulations like the European Union's General Data Protection Regulation.
In the new AI world, I believe the job will increasingly entail vetting data sources and suppliers to assess and limit risk.
In other words, in the traditional cybersecurity environment of yore, everything was about application security, with data playing a key but supporting role. But in the new data-is-everything world, I believe the focus is shifting to security controls that specifically protect data and assure AI models are safe and trustworthy. In many organizations, overseeing that critical mission will fall on the CDO.
3. Data poisoning attacks are inevitable.
Bad actors always see the next juicy attack vector to disrupt operations for financial gain, geopolitical interest, or data theft. After all, that has led to the rapid rise of ransomware in recent years. AI data poisoning is that new kid on the block.
This emerging threat means organizations must reexamine their security strategies and ensure they have the tools and technical know-how to properly secure their AI and machine learning (ML) systems.
It forces them to expand their thinking on the depth and breadth of what's possible in cyberattacks and understand that malicious actors are becoming ever more sophisticated.
Think of it this way: In a McKinsey survey published last year, 65% of respondents said their organizations regularly use generative AI, nearly double the percentage from a previous survey 10 months ago. With that kind of adoption rate, why wouldn't AI be in hackers' crosshairs?
These three points are worth considering as organizations start preparing for the inevitable. Data poisoning, like it or not, is a huge part of the future of cybersecurity.