arXiv 2510.24932

RiddleBench: A New Generative Reasoning Benchmark for LLMs

By Deepon Halder, Alan Saji, et al.

Published 2025-10-28

Discussion

Read the public discussion and references gathered around this paper.

Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, whic…

View the original paper on arXiv