arXiv 2601.18777
PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation
By Abhishek Divekar and Anirban Majumder
Published 2026-01-26
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) tha…