arXiv 2510.17431
Agentic Reinforcement Learning for Search is Unsafe
By Yushi Yang, Shreyansh Padarha, et al.
Published 2025-10-20
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…