arXiv 2406.14051
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
By Nidhir Bhavsar, Jonathan Jordan, et al.
Published 2024-06-20
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
What makes a good Large Language Model (LLM)? That it performs well on the relevant benchmarks -- which hopefully measure, with some validity, the presence of capabilities that are also challenged in real application. But what makes the model perform well? What gives a model its abilities? We take a recently introduced type of benchmark that is meant to challenge capabilities in a goal-directed, agentive context thr…