arXiv 2501.00274

LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts

By Helia Hashemi, Jason Eisner, et al.

Published 2024-12-31

Citation lineage

Review the prior work and downstream research connected to this paper.

This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted with each rubric question and produces a distribution over potential responses. The LLM predictions often fail to agree well with human judges -- indeed, the humans do not fully agr…

View the original paper on arXiv