arXiv 2509.21240

Tree Search for LLM Agent Reinforcement Learning

By Yuxiang Ji, Ziyu Ma, et al.

Published 2025-09-25

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…

View the original paper on arXiv