arXiv 2509.21240
Tree Search for LLM Agent Reinforcement Learning
By Yuxiang Ji, Ziyu Ma, et al.
Published 2025-09-25
Citation lineage
Review the prior work and downstream research connected to this paper.
Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…