arXiv 2510.07192
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
By Alexandra Souly, Javier Rando, et al.
Published 2025-10-08
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks inste…