SF-RAG: Structure-Fidelity Retrieval-Augmented Generation for Academic Question Answering
Rui Yu, Tianyi Wang, Ruixia Liu, Yinglong Wang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Efficient question-answering (QA) over extensive scientific literature is essential for evidence-based engineering decision-making. Retrieval-augmented generation (RAG) is increasingly applied to question-answering over long academic papers, where accurate evidence allocation under a fixed token budget is critical. However, existing approaches flatten papers into unstructured chunks, destroying the native hierarchical structure and forcing retrieval to operate in a disordered space. This produces fragmented contexts, misallocates tokens to non-evidential regions, and increases the reasoning burden for downstream language models.To address these issues, we propose SF-RAG, an RAG framework that treats the native hierarchical structure of academic papers as a low-entropy retrieval prior.SF-RAG first inherits the native hierarchy to construct a structure-fidelity index, which prevents entropy increase at the source.It then designs a path-guided retrieval mechanism that aligns query semantics to relevant sections and selects high relevance root-to-leaf paths under a fixed token budget, yielding compact, coherent, and low-entropy retrieval contexts.In contrast to existing RAG approaches, SF-RAG avoids entropy increase caused by destructive preprocessing and provides a native low-entropy structural basis for subsequent retrieval. We further introduce entropy-based structural diagnostics to quantify retrieval fragmentation and evidence allocation accuracy.Evaluations across three QA benchmarks show that SF-RAG significantly reduces retrieval fragmentation and improves evidence allocation. These structural benefits drive superior answer quality, establishing a scalable foundation for intelligent engineering document systems and future applications in technical specifications.