arXiv 2510.17431

Agentic Reinforcement Learning for Search is Unsafe

By Yushi Yang, Shreyansh Padarha, et al.

Published 2025-10-20

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries.…

View the original paper on arXiv