arXiv 2510.24932

RiddleBench: A New Generative Reasoning Benchmark for LLMs

By Deepon Halder, Alan Saji, et al.

Published 2025-10-28

Citation lineage

Review the prior work and downstream research connected to this paper.

Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, whic…

View the original paper on arXiv