Evaluating Search Engines and Large Language Models for Answering Health Questions

2024-07-17Code Available0· sign in to hype

Marcos Fernández-Pichel, Juan C. Pichel, David E. Losada

Code Available — Be the first to reproduce this paper.

Code

github.com/marcosfp97/llm-binary-health-qa
OfficialIn papernone★ 1

Abstract

Search engines (SEs) have traditionally been primary tools for information seeking, but the new Large Language Models (LLMs) are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-augmented (RAG) variants in answering 150 health-related questions from the TREC Health Misinformation (HM) Track. Results reveal SEs correctly answer between 50 and 70% of questions, often hindered by many retrieval results not responding to the health question. LLMs deliver higher accuracy, correctly answering about 80% of questions, though their performance is sensitive to input prompts. RAG methods significantly enhance smaller LLMs' effectiveness, improving accuracy by up to 30% by integrating retrieval evidence.

Tasks

Misinformation Navigate Question Answering RAG Retrieval

Evaluating Search Engines and Large Language Models for Answering Health Questions

Code

Abstract

Tasks

Reproductions