arXiv 2110.14168
Training Verifiers to Solve Math Word Problems
By Karl Cobbe, Vineet Kosaraju, et al.
Published 2021-10-27
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, desp…