arXiv 2110.14168
Training Verifiers to Solve Math Word Problems
By Karl Cobbe, Vineet Kosaraju, et al.
Published 2021-10-27
Citation lineage
Review the prior work and downstream research connected to this paper.
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, desp…