arXiv 2506.00309
Evaluation of LLMs for mathematical problem solving
By Ruonan Wang, Runxi Wang, et al.
Published 2025-05-30
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large Language Models (LLMs) have shown impressive performance on a range of educational tasks, but are still understudied for their potential to solve mathematical problems. In this study, we compare three prominent LLMs, including GPT-4o, DeepSeek-V3, and Gemini-2.0, on three mathematics datasets of varying complexities (GSM8K, MATH500, and MIT Open Courseware datasets). We take a five-dimensional approach based o…