| AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | May 13, 2024 | Decision MakingDiagnostic | —Unverified | 0 |
| Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | May 5, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| Capabilities of Gemini Models in Medicine | Apr 29, 2024 | In-Context LearningMedQA | —Unverified | 0 |
| Assessing The Potential Of Mid-Sized Language Models For Clinical QA | Apr 24, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering | Mar 7, 2024 | Information RetrievalLanguage Modelling | CodeCode Available | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 |
| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 |
| SM70: A Large Language Model for Medical Devices | Dec 12, 2023 | Decision MakingInformation Retrieval | —Unverified | 0 |