| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 | 0 |
| Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | May 29, 2025 | Decision MakingMedQA | —Unverified | 0 | 0 |
| SM70: A Large Language Model for Medical Devices | Dec 12, 2023 | Decision MakingInformation Retrieval | —Unverified | 0 | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 | 0 |
| Susceptibility of Large Language Models to User-Driven Factors in Medical Queries | Mar 26, 2025 | DiagnosticMedQA | —Unverified | 0 | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 | 0 |
| Reliable and diverse evaluation of LLM medical knowledge mastery | Sep 22, 2024 | DiversityMedQA | —Unverified | 0 | 0 |
| Disentangling Reasoning and Knowledge in Medical Large Language Models | May 16, 2025 | DiagnosticMedQA | —Unverified | 0 | 0 |