arXiv 2405.20859

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

By Anne Beyer, Kranti Chalamalasetti, et al.

Published 2024-05-31

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments…

View the original paper on arXiv