SOTAVerified

Medical Question Answering

Papers

Showing 125 of 139 papers

TitleStatusHype
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents0
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs0
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models0
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained ReflectionCode0
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningCode2
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases0
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs0
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering0
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?0
Collaboration among Multiple Large Language Models for Medical Question Answering0
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language ModelCode0
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs0
Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI0
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding0
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QACode0
Walk the Talk? Measuring the Faithfulness of Large Language Model ExplanationsCode1
Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs0
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization0
MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering0
Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems0
MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways0
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical ReasoningCode2
Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework0
Show:102550
← PrevPage 1 of 6Next →

No leaderboard results yet.