EvalugatoršŸŠā€”Rapid, Agile Development and Evaluation of Retrieval Augmented Generation Systems Without Labels

Published in Accepted ECIR-2026, 2026

Recommended citation: Bevan Koopman, Hang Li, Shuai Wang and Guido Zuccon. 2025. EvalugatoršŸŠā€”Rapid, Agile Development and Evaluation of Retrieval Augmented Generation Systems Without Labels (Accepted ECIR-2026). https://hangli.me/files/ecir2026-evalugator.pdf

Abstract

Evaluating complex Retrieval Augmented Generation (RAG) systems in real-world settings is challenging. There is often a lack of finegrained labelled data and the absence of comprehensive evaluation tools that can assess individual components of a pipeline. This hinders rapid, rigorous development, particularly for agentic RAG systems. We was our experienced at GuideStream.AI, a startup developing an AI for clinical guideline recommendation. To address this gap, we developed Evalugator , a suite of agentic components to support agile development and evaluation. Evalugator features: (1) generation of synthetic queries, relevance assessments, answers and evaluation criteria for training and evaluation in new domains; (2) LLM-based judging agents; and (3) simple UI and API tools to launch experiments and analyse results. This paper uses Evalugator as a case study to demonstrate how a principled, agent-based evaluation framework can support the rapid development of complex RAG systems in a startup environment.