arXiv 2507.17746

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

By Anisha Gunjal, Anthony Wang, et al.

Published 2025-07-23

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for complex reasoning tasks with clear correctness signals such as math and coding. However, extending it to real-world reasoning tasks is challenging, as evaluation depends on nuanced, multi-criteria judgments rather than binary correctness. Instance-specific rubrics have recently been used in evaluation benchmarks to capture such judgments,…

View the original paper on arXiv