| Assessing The Potential Of Mid-Sized Language Models For Clinical QA | Apr 24, 2024 | MedQAQuestion Answering | —Unverified | 0 |
| AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients | Feb 21, 2025 | MedQAPrompt Engineering | —Unverified | 0 |
| Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content | Jun 25, 2025 | ArticlesContinual Pretraining | —Unverified | 0 |
| CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering | Jan 30, 2025 | General KnowledgeLanguage Modeling | —Unverified | 0 |
| Capabilities of Gemini Models in Medicine | Apr 29, 2024 | In-Context LearningMedQA | —Unverified | 0 |
| Challenges of GPT-3-based Conversational Agents for Healthcare | Aug 28, 2023 | Medical Question AnsweringMedQA | —Unverified | 0 |
| CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation | Apr 14, 2025 | MedQA | —Unverified | 0 |
| DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models | Sep 2, 2024 | Medical DiagnosisMedQA | —Unverified | 0 |
| DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining | Sep 30, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 |
| Eir: Thai Medical Large Language Models | Sep 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation | Jun 7, 2025 | MedQAQuantization | —Unverified | 0 |
| Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Mar 19, 2025 | counterfactualDecision Making | —Unverified | 0 |
| Evaluation of the phi-3-mini SLM for identification of texts related to medicine, health, and sports injuries | Mar 31, 2025 | 4kMedQA | —Unverified | 0 |
| Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | May 29, 2025 | Decision MakingMedQA | —Unverified | 0 |
| SM70: A Large Language Model for Medical Devices | Dec 12, 2023 | Decision MakingInformation Retrieval | —Unverified | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| Susceptibility of Large Language Models to User-Driven Factors in Medical Queries | Mar 26, 2025 | DiagnosticMedQA | —Unverified | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 |
| WiNGPT-3.0 Technical Report | May 23, 2025 | DiagnosticMedQA | CodeCode Available | 0 |
| MedMobile: A mobile-sized language model with expert-level clinical capabilities | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification | May 23, 2025 | MedQA | CodeCode Available | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | Jun 17, 2024 | MedQA | CodeCode Available | 0 |
| MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge | Jun 5, 2024 | MedQA | CodeCode Available | 0 |
| Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection | Jun 11, 2025 | Medical Question AnsweringMedQA | CodeCode Available | 0 |
| IMAS: A Comprehensive Agentic Approach to Rural Healthcare Delivery | Oct 13, 2024 | MedQA | CodeCode Available | 0 |
| Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering | Mar 7, 2024 | Information RetrievalLanguage Modelling | CodeCode Available | 0 |
| DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents | Mar 30, 2023 | Conversation SummarizationLanguage Modeling | CodeCode Available | 0 |
| Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology | Apr 24, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |