arXiv 2510.23948

ChessQA: Evaluating Large Language Models for Chess Understanding

By Qianfeng Wen, Zhenwei Tang, et al.

Published 2025-10-28

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Chess provides an ideal testbed for evaluating the reasoning, modeling, and abstraction capabilities of large language models (LLMs), as it has well-defined structure and objective ground truth while admitting a wide spectrum of skill levels. However, existing evaluations of LLM ability in chess are ad hoc and narrow in scope, making it difficult to accurately measure LLM chess understanding and how it varies with s…

View the original paper on arXiv