| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 |
| Reliable and diverse evaluation of LLM medical knowledge mastery | Sep 22, 2024 | DiversityMedQA | —Unverified | 0 |
| Disentangling Reasoning and Knowledge in Medical Large Language Models | May 16, 2025 | DiagnosticMedQA | —Unverified | 0 |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Feb 18, 2025 | Graph GenerationKnowledge Graphs | —Unverified | 0 |
| A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage | Apr 28, 2025 | MedQA | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | May 13, 2024 | Decision MakingDiagnostic | —Unverified | 0 |
| Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | May 5, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Sep 23, 2024 | HallucinationMedQA | —Unverified | 0 |