arXiv 2510.24932
RiddleBench: A New Generative Reasoning Benchmark for LLMs
By Deepon Halder, Alan Saji, et al.
Published 2025-10-28
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, whic…