arXiv 2511.04502

RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

By Joshua Gao, Quoc Huy Pham, et al.

Published 2025-11-06

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence, yet evaluating RAG systems in specialized, safety-critical domains remains a significant challenge. Existing evaluation frameworks often rely on heuristic-based metrics that fail to capture domain-specific nuances and other works utilize LLM-as-a-Judge approaches that lack validated alignment…

View the original paper on arXiv