| Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | Nov 28, 2023 | Electrical EngineeringExperimental Design | CodeCode Available | 5 |
| Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Aug 1, 2024 | Medical Question AnsweringMedQA | CodeCode Available | 4 |
| Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support | Feb 25, 2025 | Decision MakingDiagnostic | CodeCode Available | 2 |
| GreaseLM: Graph REASoning Enhanced Language Models for Question Answering | Jan 21, 2022 | Knowledge GraphsMedical Question Answering | CodeCode Available | 2 |
| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning | Nov 16, 2023 | MedQAMMLU | CodeCode Available | 2 |
| What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams | Sep 28, 2020 | MedQAMultiple-choice | CodeCode Available | 2 |
| Towards Expert-Level Medical Question Answering with Large Language Models | May 16, 2023 | Medical Question AnsweringMedQA | CodeCode Available | 1 |
| MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework | Oct 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Relation-Aware Language-Graph Transformer for Question Answering | Dec 2, 2022 | Medical Question AnsweringMedQA | CodeCode Available | 1 |
| Kformer: Knowledge Injection in Transformer Feed-Forward Layers | Jan 15, 2022 | Language ModellingMedical Question Answering | CodeCode Available | 1 |
| QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering | Apr 13, 2021 | Common Sense ReasoningGraph Representation Learning | CodeCode Available | 1 |
| O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | Jan 11, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks | May 28, 2023 | MedQAMemorization | CodeCode Available | 1 |
| FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering | Feb 23, 2023 | Knowledge GraphsMedical Question Answering | CodeCode Available | 1 |
| Variational Open-Domain Question Answering | Sep 23, 2022 | Language ModellingMedQA | CodeCode Available | 1 |
| To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering | Mar 4, 2024 | MedQAMMLU | CodeCode Available | 1 |
| Large Language Models Encode Clinical Knowledge | Dec 26, 2022 | Clinical KnowledgeMedQA | CodeCode Available | 1 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 |
| Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding | May 19, 2023 | GPULanguage Modeling | CodeCode Available | 1 |
| Can large language models reason about medical questions? | Jul 17, 2022 | MedQAMultiple-choice | CodeCode Available | 1 |
| MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning | Jun 3, 2024 | DiagnosticMedQA | CodeCode Available | 1 |
| Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training | Jun 18, 2025 | MedQAMMLU | —Unverified | 0 |
| Generating multiple-choice questions for medical question answering with distractors and cue-masking | Mar 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering | Mar 22, 2023 | Common Sense ReasoningKnowledge Graphs | —Unverified | 0 |
| GreaseLM: Graph REASoning Enhanced Language Models | Sep 29, 2021 | Knowledge GraphsMedical Question Answering | —Unverified | 0 |
| Hierarchical Representation-based Dynamic Reasoning Network for Biomedical Question Answering | Oct 1, 2022 | MedQAQuestion Answering | —Unverified | 0 |
| Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs | Jun 13, 2025 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs | Sep 6, 2023 | HallucinationKnowledge Graphs | —Unverified | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 |
| LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models | Dec 31, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Jun 17, 2025 | ARCCoLA | —Unverified | 0 |
| MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation | Mar 18, 2025 | MedQA | —Unverified | 0 |
| MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering | Sep 27, 2023 | In-Context LearningMedical Question Answering | —Unverified | 0 |
| MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | Jun 3, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Medical Exam Question Answering with Large-scale Reading Comprehension | Feb 28, 2018 | MedQAQuestion Answering | —Unverified | 0 |
| Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards | Jun 13, 2025 | DiagnosticMedQA | —Unverified | 0 |
| MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | Jan 30, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| MMDS: A Multimodal Medical Diagnosis System Integrating Image Analysis and Knowledge-based Departmental Consultation | Oct 20, 2024 | Emotion RecognitionFacial Emotion Recognition | —Unverified | 0 |
| OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | Feb 16, 2025 | MedQAMMLU | —Unverified | 0 |
| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 |
| Reliable and diverse evaluation of LLM medical knowledge mastery | Sep 22, 2024 | DiversityMedQA | —Unverified | 0 |
| Disentangling Reasoning and Knowledge in Medical Large Language Models | May 16, 2025 | DiagnosticMedQA | —Unverified | 0 |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Feb 18, 2025 | Graph GenerationKnowledge Graphs | —Unverified | 0 |
| A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage | Apr 28, 2025 | MedQA | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | May 13, 2024 | Decision MakingDiagnostic | —Unverified | 0 |
| Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | May 5, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Sep 23, 2024 | HallucinationMedQA | —Unverified | 0 |