arXiv 2503.16416

Survey on Evaluation of LLM-based Agents

By Asaf Yehudai, Lilach Eden, et al.

Published 2025-03-20

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

The emergence of LLM-based agents represents a paradigm shift in AI, enabling autonomous systems to plan, reason, use tools, and maintain memory while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methodologies for these increasingly capable agents. We systematically analyze evaluation benchmarks and frameworks across four critical dimensions: (1) fundamental…

View the original paper on arXiv