arXiv 2405.20859

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

By Anne Beyer, Kranti Chalamalasetti, et al.

Published 2024-05-31

Citation lineage

Review the prior work and downstream research connected to this paper.

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments…

View the original paper on arXiv