arXiv 2405.00451

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

By Yuxi Xie, Anirudh Goyal, et al.

Published 2024-05-01

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhanc…

View the original paper on arXiv