arXiv 2406.14051

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

By Nidhir Bhavsar, Jonathan Jordan, et al.

Published 2024-06-20

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

What makes a good Large Language Model (LLM)? That it performs well on the relevant benchmarks -- which hopefully measure, with some validity, the presence of capabilities that are also challenged in real application. But what makes the model perform well? What gives a model its abilities? We take a recently introduced type of benchmark that is meant to challenge capabilities in a goal-directed, agentive context thr…

View the original paper on arXiv