arXiv 2509.21240

Tree Search for LLM Agent Reinforcement Learning

By Yuxiang Ji, Ziyu Ma, et al.

Published 2025-09-25

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree sea…

View the original paper on arXiv