| MedMobile: A mobile-sized language model with expert-level clinical capabilities | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Sep 23, 2024 | HallucinationMedQA | —Unverified | 0 |
| Reliable and diverse evaluation of LLM medical knowledge mastery | Sep 22, 2024 | DiversityMedQA | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models | Sep 2, 2024 | Medical DiagnosisMedQA | —Unverified | 0 |
| Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | Jun 17, 2024 | MedQA | CodeCode Available | 0 |
| MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge | Jun 5, 2024 | MedQA | CodeCode Available | 0 |
| MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | Jun 3, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |