arXiv 2509.21240
Tree Search for LLM Agent Reinforcement Learning
By Yuxiang Ji, Ziyu Ma, et al.
Published 2025-09-25
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…