arXiv 2505.05445

clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations

By Chalamalasetti Kranti, Sherzod Hakimov, et al.

Published 2025-05-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The emergence of instruction-tuned large language models (LLMs) has advanced the field of dialogue systems, enabling both realistic user simulations and robust multi-turn conversational agents. However, existing research often evaluates these components in isolation-either focusing on a single user simulator or a specific system design-limiting the generalisability of insights across architectures and configurations…

View the original paper on arXiv