| MedMobile: A mobile-sized language model with expert-level clinical capabilities | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Sep 23, 2024 | HallucinationMedQA | —Unverified | 0 |
| Reliable and diverse evaluation of LLM medical knowledge mastery | Sep 22, 2024 | DiversityMedQA | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models | Sep 2, 2024 | Medical DiagnosisMedQA | —Unverified | 0 |
| Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | Jun 17, 2024 | MedQA | CodeCode Available | 0 |
| MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge | Jun 5, 2024 | MedQA | CodeCode Available | 0 |
| MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | Jun 3, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | May 13, 2024 | Decision MakingDiagnostic | —Unverified | 0 |
| Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | May 5, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| Capabilities of Gemini Models in Medicine | Apr 29, 2024 | In-Context LearningMedQA | —Unverified | 0 |
| Assessing The Potential Of Mid-Sized Language Models For Clinical QA | Apr 24, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering | Mar 7, 2024 | Information RetrievalLanguage Modelling | CodeCode Available | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 |
| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 |
| SM70: A Large Language Model for Medical Devices | Dec 12, 2023 | Decision MakingInformation Retrieval | —Unverified | 0 |
| MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering | Sep 27, 2023 | In-Context LearningMedical Question Answering | —Unverified | 0 |
| Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs | Sep 6, 2023 | HallucinationKnowledge Graphs | —Unverified | 0 |
| Challenges of GPT-3-based Conversational Agents for Healthcare | Aug 28, 2023 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology | Apr 24, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents | Mar 30, 2023 | Conversation SummarizationLanguage Modeling | CodeCode Available | 0 |
| GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering | Mar 22, 2023 | Common Sense ReasoningKnowledge Graphs | —Unverified | 0 |
| Generating multiple-choice questions for medical question answering with distractors and cue-masking | Mar 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Hierarchical Representation-based Dynamic Reasoning Network for Biomedical Question Answering | Oct 1, 2022 | MedQAQuestion Answering | —Unverified | 0 |
| GreaseLM: Graph REASoning Enhanced Language Models | Sep 29, 2021 | Knowledge GraphsMedical Question Answering | —Unverified | 0 |
| Medical Exam Question Answering with Large-scale Reading Comprehension | Feb 28, 2018 | MedQAQuestion Answering | —Unverified | 0 |