arXiv 2505.15146

lmgame-Bench: How Good are LLMs at Playing Games?

By Lanxiang Hu, Mingjia Huo, et al.

Published 2025-05-21

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Playing video games requires perception, memory, and planning, exactly the faculties modern large language model (LLM) agents are expected to master. We study the major challenges in using popular video games to evaluate modern LLMs and find that directly dropping LLMs into games cannot make an effective evaluation, for three reasons -- brittle vision perception, prompt sensitivity, and potential data contamination.…

View the original paper on arXiv