| Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs | Jun 13, 2025 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards | Jun 13, 2025 | DiagnosticMedQA | —Unverified | 0 |
| Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection | Jun 11, 2025 | Medical Question AnsweringMedQA | CodeCode Available | 0 |
| Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation | Jun 7, 2025 | MedQAQuantization | —Unverified | 0 |
| Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | May 29, 2025 | Decision MakingMedQA | —Unverified | 0 |
| TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification | May 23, 2025 | MedQA | CodeCode Available | 0 |
| WiNGPT-3.0 Technical Report | May 23, 2025 | DiagnosticMedQA | CodeCode Available | 0 |
| Disentangling Reasoning and Knowledge in Medical Large Language Models | May 16, 2025 | DiagnosticMedQA | —Unverified | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 |
| A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage | Apr 28, 2025 | MedQA | —Unverified | 0 |
| CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation | Apr 14, 2025 | MedQA | —Unverified | 0 |
| Evaluation of the phi-3-mini SLM for identification of texts related to medicine, health, and sports injuries | Mar 31, 2025 | 4kMedQA | —Unverified | 0 |
| Susceptibility of Large Language Models to User-Driven Factors in Medical Queries | Mar 26, 2025 | DiagnosticMedQA | —Unverified | 0 |
| Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Mar 19, 2025 | counterfactualDecision Making | —Unverified | 0 |
| MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation | Mar 18, 2025 | MedQA | —Unverified | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 |
| AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients | Feb 21, 2025 | MedQAPrompt Engineering | —Unverified | 0 |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Feb 18, 2025 | Graph GenerationKnowledge Graphs | —Unverified | 0 |
| OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | Feb 16, 2025 | MedQAMMLU | —Unverified | 0 |
| MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | Jan 30, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering | Jan 30, 2025 | General KnowledgeLanguage Modeling | —Unverified | 0 |
| LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models | Dec 31, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MMDS: A Multimodal Medical Diagnosis System Integrating Image Analysis and Knowledge-based Departmental Consultation | Oct 20, 2024 | Emotion RecognitionFacial Emotion Recognition | —Unverified | 0 |
| IMAS: A Comprehensive Agentic Approach to Rural Healthcare Delivery | Oct 13, 2024 | MedQA | CodeCode Available | 0 |