arXiv 2510.07192
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
By Alexandra Souly, Javier Rando, et al.
Published 2025-10-08
Discussion
Read the public discussion and references gathered around this paper.
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks inste…