arXiv 2510.24932

RiddleBench: A New Generative Reasoning Benchmark for LLMs

By Deepon Halder, Alan Saji, et al.

Published 2025-10-28

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, whic…

View the original paper on arXiv