arXiv 2405.20859
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
By Anne Beyer, Kranti Chalamalasetti, et al.
Published 2024-05-31
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments…