arXiv 2405.00451

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

By Yuxi Xie, Anirudh Goyal, et al.

Published 2024-05-01

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhanc…

View the original paper on arXiv