SOTAVerified

Medical Question Answering

Papers

Showing 150 of 139 papers

TitleStatusHype
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents0
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs0
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models0
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained ReflectionCode0
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical ReasoningCode2
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases0
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs0
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering0
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?0
Collaboration among Multiple Large Language Models for Medical Question Answering0
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language ModelCode0
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs0
Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI0
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding0
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QACode0
Walk the Talk? Measuring the Faithfulness of Large Language Model ExplanationsCode1
Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs0
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization0
MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering0
Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems0
MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways0
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical ReasoningCode2
Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework0
Structured Outputs Enable General-Purpose LLMs to be Medical Experts0
Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation TasksCode0
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning0
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models0
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering0
Improving Clinical Question Answering with Multi-Task Learning: A Joint Approach for Answer Extraction and Medical Categorization0
Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge0
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?0
Mitigating Unintended Memorization with LoRA in Federated Learning for LLMsCode1
A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis0
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMsCode0
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models0
An Empirical Evaluation of Large Language Models on Consumer Health Questions0
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track0
Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track0
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset0
A Benchmark for Long-Form Medical Question AnsweringCode0
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering0
The Limited Impact of Medical Adaptation of Large Language and Vision-Language ModelsCode0
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?Code0
Diagnosing Medical Datasets with Training DynamicsCode0
Rationale-Guided Retrieval Augmented Generation for Medical Question AnsweringCode1
LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models0
Large Language Model Benchmarks in Medical Tasks0
MedGo: A Chinese Medical Large Language Model0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.