arXiv 2510.17431
Agentic Reinforcement Learning for Search is Unsafe
By Yushi Yang, Shreyansh Padarha, et al.
Published 2025-10-20
Citation lineage
Review the prior work and downstream research connected to this paper.
Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…