arXiv 2207.05221

Language Models (Mostly) Know What They Know

By Saurav Kadavath, Tom Conerly, et al.

Published 2022-07-11

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to eva…

View the original paper on arXiv