arXiv 2503.06378

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

By Lexin Zhou, Lorenzo Pacchiardi, et al.

Published 2025-03-09

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales f…

View the original paper on arXiv