arXiv 2509.21240
Tree Search for LLM Agent Reinforcement Learning
By Yuxiang Ji, Ziyu Ma, et al.
Published 2025-09-25
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…