arXiv 2501.00274

LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts

By Helia Hashemi, Jason Eisner, et al.

Published 2024-12-31

Discussion

Read the public discussion and references gathered around this paper.

This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted with each rubric question and produces a distribution over potential responses. The LLM predictions often fail to agree well with human judges -- indeed, the humans do not fully agr…

View the original paper on arXiv