arXiv 2507.17746
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
By Anisha Gunjal, Anthony Wang, et al.
Published 2025-07-23
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for complex reasoning tasks with clear correctness signals such as math and coding. However, extending it to real-world reasoning tasks is challenging, as evaluation depends on nuanced, multi-criteria judgments rather than binary correctness. Instance-specific rubrics have recently been used in evaluation benchmarks to capture such judgments,…