RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines

2025-05-18Unverified0· sign in to hype

Dvir Cohen, Lin Burg, Gilad Barkan

Unverified — Be the first to reproduce this paper.

Abstract

Retrieval-Augmented Generation (RAG) systems show promise by coupling large language models with external knowledge, yet traditional RAG evaluation methods primarily report quantitative scores while offering limited actionable guidance for refining these complex pipelines. In this paper, we introduce RAGXplain, an evaluation framework that quantifies RAG performance and translates these assessments into clear insights that clarify the workings of its complex, multi-stage pipeline and offer actionable recommendations. Using LLM reasoning, RAGXplain converts raw scores into coherent narratives identifying performance gaps and suggesting targeted improvements. By providing transparent explanations for AI decision-making, our framework fosters user trust-a key challenge in AI adoption. Our LLM-based metric assessments show strong alignment with human judgments, and experiments on public question-answering datasets confirm that applying RAGXplain's actionable recommendations measurably improves system performance. RAGXplain thus bridges quantitative evaluation and practical optimization, empowering users to understand, trust, and enhance their AI systems.

Tasks

Decision Making Question Answering RAG Retrieval-augmented Generation

RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines

Abstract

Tasks

Reproductions