arXiv 2510.17431

Agentic Reinforcement Learning for Search is Unsafe

By Yushi Yang, Shreyansh Padarha, et al.

Published 2025-10-20

Citation lineage

Review the prior work and downstream research connected to this paper.

Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…

View the original paper on arXiv