arXiv 2510.07192

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

By Alexandra Souly, Javier Rando, et al.

Published 2025-10-08

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks inste…

View the original paper on arXiv