arXiv 2404.05692
Evaluating Mathematical Reasoning Beyond Accuracy
By Shijie Xia, Xuefeng Li, et al.
Published 2024-04-08
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
The leaderboard of Large Language Models (LLMs) in mathematical tasks has been continuously updated. However, the majority of evaluations focus solely on the final results, neglecting the quality of the intermediate steps. This oversight can mask underlying problems, such as logical errors or unnecessary steps in the reasoning process. To measure reasoning beyond final-answer accuracy, we introduce ReasonEval, a new…