arXiv 2510.24932
RiddleBench: A New Generative Reasoning Benchmark for LLMs
By Deepon Halder, Alan Saji, et al.
Published 2025-10-28
Citation lineage
Review the prior work and downstream research connected to this paper.
Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, whic…