| From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | Jun 18, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs | Jun 13, 2025 | Medical Question AnsweringMedQA | —Unverified | 0 |
| MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models | Jun 12, 2025 | Image SegmentationMedical Diagnosis | —Unverified | 0 |
| Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection | Jun 11, 2025 | Medical Question AnsweringMedQA | CodeCode Available | 0 |
| ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | Jun 11, 2025 | Medical Question AnsweringQuestion Answering | CodeCode Available | 2 |
| ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases | May 30, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 |
| Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs | May 30, 2025 | Fact CheckingHallucination | —Unverified | 0 |
| MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering | May 29, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room | May 28, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare | May 26, 2025 | BenchmarkingMedical Diagnosis | CodeCode Available | 0 |
| Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? | May 23, 2025 | Medical Question AnsweringQuantization | —Unverified | 0 |
| Collaboration among Multiple Large Language Models for Medical Question Answering | May 22, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 |
| Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model | May 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 |
| Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI | May 11, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding | Apr 30, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | Apr 30, 2025 | Information RetrievalMedical Question Answering | CodeCode Available | 0 |
| Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations | Apr 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs | Apr 15, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization | Apr 10, 2025 | Anomaly DetectionBilevel Optimization | —Unverified | 0 |
| MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering | Mar 20, 2025 | Knowledge GraphsMedical Question Answering | —Unverified | 0 |
| Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Mar 19, 2025 | counterfactualDecision Making | —Unverified | 0 |
| MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways | Mar 17, 2025 | Decision MakingMedical Question Answering | —Unverified | 0 |
| MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | Mar 10, 2025 | BenchmarkingMedical Question Answering | CodeCode Available | 2 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 |
| Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Mar 5, 2025 | Clinical KnowledgeMedical Question Answering | —Unverified | 0 |
| Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks | Mar 5, 2025 | Medical Question Answeringparameter-efficient fine-tuning | CodeCode Available | 0 |
| Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning | Feb 27, 2025 | MathMedical Question Answering | —Unverified | 0 |
| MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Feb 20, 2025 | Decision MakingHallucination | —Unverified | 0 |
| RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering | Feb 19, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 |
| Improving Clinical Question Answering with Multi-Task Learning: A Joint Approach for Answer Extraction and Medical Categorization | Feb 18, 2025 | Information RetrievalMedical Question Answering | —Unverified | 0 |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Feb 18, 2025 | Graph GenerationKnowledge Graphs | —Unverified | 0 |
| SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? | Feb 18, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs | Feb 7, 2025 | Federated LearningMedical Question Answering | CodeCode Available | 1 |
| A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis | Jan 27, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs | Jan 24, 2025 | Knowledge GraphsMedical Question Answering | CodeCode Available | 0 |
| LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models | Dec 31, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| An Empirical Evaluation of Large Language Models on Consumer Health Questions | Dec 31, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track | Nov 27, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Benchmark for Long-Form Medical Question Answering | Nov 14, 2024 | Answer GenerationForm | CodeCode Available | 0 |
| Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | Nov 14, 2024 | Medical Question AnsweringMisinformation | —Unverified | 0 |
| The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Nov 13, 2024 | Medical Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Nov 6, 2024 | Medical Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Diagnosing Medical Datasets with Training Dynamics | Nov 3, 2024 | Medical Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Rationale-Guided Retrieval Augmented Generation for Medical Question Answering | Nov 1, 2024 | Medical Question AnsweringQuestion Answering | CodeCode Available | 1 |
| LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models | Oct 31, 2024 | Fact CheckingMedical Question Answering | —Unverified | 0 |
| Large Language Model Benchmarks in Medical Tasks | Oct 28, 2024 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| MedGo: A Chinese Medical Large Language Model | Oct 27, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |