arXiv 2503.06378
General Scales Unlock AI Evaluation with Explanatory and Predictive Power
By Lexin Zhou, Lorenzo Pacchiardi, et al.
Published 2025-03-09
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales f…