RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

2024-07-02Unverified0· sign in to hype

Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro

Unverified — Be the first to reproduce this paper.

Abstract

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend, and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks. Specifically, our Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. In addition, it also performs comparably to GPT-4 on five RAG benchmarks in the biomedical domain without instruction fine-tuning on biomedical data, demonstrating its superb capability for generalization to new domains.

Tasks

Answer Generation Question Answering RAG Retrieval Retrieval-augmented Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Natural Questions	RankRAG-llama3-70b (Zero-Shot, KILT)	EM	54.2	—	Unverified
Natural Questions	RankRAG-llama3-8b (Zero-Shot, KILT)	EM	50.6	—	Unverified
Natural Questions	RankRAG-llama3-70b (Zero-Shot, DPR)	EM	50	—	Unverified
Natural Questions	RankRAG-llama3-8b (Zero-Shot, DPR)	EM	46.1	—	Unverified
PubMedQA	RankRAG-llama3-70B (Zero-Shot)	Accuracy	79.8	—	Unverified
TriviaQA	RankRAG-llama3-70b (Zero-Shot, KILT)	EM	86.5	—	Unverified
TriviaQA	RankRAG-llama3-8b (Zero-Shot, KILT)	EM	82.9	—	Unverified
TriviaQA	RankRAG-llama3-70b (Zero-Shot, DPR)	EM	72.6	—	Unverified

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Abstract

Tasks

Benchmark Results

Reproductions