arXiv 2511.15304

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

By Piercosma Bisconti, Matteo Prandi, et al.

Published 2025-11-19

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offenc…

View the original paper on arXiv