ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation

2025-01-21Code Available0· sign in to hype

Peter Devine

Code Available — Be the first to reproduce this paper.

Code

github.com/lightblue-tech/aloftrag
OfficialIn papernone★ 6

Abstract

Retrieval Augmented Generation (RAG) systems have been shown to improve the accuracy of Large Language Model (LLM) outputs. However, these models can often achieve low accuracy when applied to new data domains. We introduce the Automatic Local Fine Tuning of Retrieval Augmented Generation models (ALoFTRAG) framework, designed to improve the accuracy of RAG systems on a given domain by training LLMs without manually labeled data or using larger teacher models. By generating and filtering synthetic training data and performing LoRA fine-tuning, ALoFTRAG improves citation and answer accuracy across 20 datasets in 26 languages by, on average, 8.3% and 3.0% respectively. Our results demonstrate that ALoFTRAG offers a practical, cost-effective, and data-secure solution for improving RAG accuracy, making it particularly applicable to sensitive domains such as healthcare and finance.

Tasks

Language Modeling Language Modelling Large Language Model RAG Retrieval Retrieval-augmented Generation

ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation

Code

Abstract

Tasks

Reproductions