arXiv 2509.21240

Tree Search for LLM Agent Reinforcement Learning

By Yuxiang Ji, Ziyu Ma, et al.

Published 2025-09-25

Citation lineage

Review the prior work and downstream research connected to this paper.

Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…

View the original paper on arXiv