arXiv 2510.07192
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
By Alexandra Souly, Javier Rando, et al.
Published 2025-10-08
Citation lineage
Review the prior work and downstream research connected to this paper.
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks inste…