arXiv 2510.17431

Agentic Reinforcement Learning for Search is Unsafe

By Yushi Yang, Shreyansh Padarha, et al.

Published 2025-10-20

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…

View the original paper on arXiv