arXiv 2505.15146

lmgame-Bench: How Good are LLMs at Playing Games?

By Lanxiang Hu, Mingjia Huo, et al.

Published 2025-05-21

Citation lineage

Review the prior work and downstream research connected to this paper.

Playing video games requires perception, memory, and planning, exactly the faculties modern large language model (LLM) agents are expected to master. We study the major challenges in using popular video games to evaluate modern LLMs and find that directly dropping LLMs into games cannot make an effective evaluation, for three reasons -- brittle vision perception, prompt sensitivity, and potential data contamination.…

View the original paper on arXiv