arXiv 2510.17431
Agentic Reinforcement Learning for Search is Unsafe
By Yushi Yang, Shreyansh Padarha, et al.
Published 2025-10-20
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…