arXiv 2405.20859

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

By Anne Beyer, Kranti Chalamalasetti, et al.

Published 2024-05-31

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments…

View the original paper on arXiv