Data Poisoning

Data poisoning is a cyberattack where malicious actors insert corrupted or misleading data into an AI model's training set to manipulate its future behavior or create security vulnerabilities.

AI securitycybersecuritydata poisoningadversarial machine learningAI safetymodel robustness

Definition

Data Poisoning is a type of adversarial attack that target's the model's "brain" during its developmental phase. By corrupting the training data, attackers can degrade the model's overall performance or install a "backdoor" that only they can trigger.

How Data Poisoning Works

1. Label Flipping

The attacker changes the labels of training data (e.g., marking images of spam as "not spam") so the model learns a wrong classification rule.

2. Backdoor Attacks

The attacker inserts a specific "trigger" into some training samples. The model will behave normally until it sees that trigger in the real world. For example, a self-driving car might be trained to ignore stop signs if they have a specific sticker on them.

Real-World Risks

As AI companies scrape more data from the public internet, they become more vulnerable to poisoning. Malicious actors can publish "poisoned" websites or documents that they know will likely be included in the next major LLM's dataset.

Research Findings

Anthropic's research has highlighted that the size of the model doesn't necessarily protect it from poisoning. In fact, larger models can sometimes be more susceptible to subtle poisoning because they are better at picking up on the hidden patterns the attacker has inserted.

Learn more about this in our blog post on Anthropic's data poisoning research.

Frequently Asked Questions

Extremely. Recent research by Anthropic shows that even a small amount of 'poisoned' data (as few as 250 documents) can compromise a large-scale model, making it perform poorly or reveal sensitive information when triggered.
Prevention involves rigorous data sanitization, checking the provenance of training data, and using robust training techniques that are less sensitive to outliers.

Continue Learning

Explore our lessons and prompts to deepen your AI knowledge.