| Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration | Sep 5, 2024 | Decision MakingMedical Question Answering | CodeCode Available | 0 | 5 |
| Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | Apr 30, 2025 | Information RetrievalMedical Question Answering | CodeCode Available | 0 | 5 |
| Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings | Jan 15, 2024 | Knowledge Graph EmbeddingsKnowledge Graphs | CodeCode Available | 0 | 5 |
| Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model | May 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Nov 13, 2024 | Medical Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision | Sep 30, 2022 | Medical Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| MRC-based Medical NER with Multi-task Learning and Multi-strategies | Oct 1, 2022 | Boundary DetectionDecoder | —Unverified | 0 | 0 |
| MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training | Mar 23, 2024 | Knowledge GraphsMachine Reading Comprehension | —Unverified | 0 | 0 |
| Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access | Jun 2, 2021 | Information RetrievalMedical Question Answering | —Unverified | 0 | 0 |
| MultiMed: Massively Multimodal and Multitask Medical Understanding | Aug 22, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 | 0 |
| OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Feb 29, 2024 | Medical Question AnsweringMedQA | —Unverified | 0 | 0 |
| Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track | Nov 27, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 | 0 |
| PEFT-MedAware: Large Language Model for Medical Awareness | Nov 17, 2023 | Computational EfficiencyLanguage Modeling | —Unverified | 0 | 0 |
| PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization | Apr 10, 2025 | Anomaly DetectionBilevel Optimization | —Unverified | 0 | 0 |
| Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries | May 17, 2023 | Clinical KnowledgeFew-Shot Learning | —Unverified | 0 | 0 |
| RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering | Feb 19, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 | 0 |
| SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? | Feb 18, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Large Language Models are In-context Teachers for Knowledge Reasoning | Nov 12, 2023 | In-Context LearningInformation Retrieval | —Unverified | 0 | 0 |
| SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research | Jul 3, 2024 | DiagnosticMedical Question Answering | —Unverified | 0 | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 | 0 |
| Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Mar 5, 2025 | Clinical KnowledgeMedical Question Answering | —Unverified | 0 | 0 |
| Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Jun 3, 2024 | ChatbotLanguage Modeling | —Unverified | 0 | 0 |
| Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? | May 23, 2025 | Medical Question AnsweringQuantization | —Unverified | 0 | 0 |
| TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models | Jun 7, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Towards Generalist Biomedical AI | Jul 26, 2023 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models | Aug 25, 2024 | Decision MakingHallucination | —Unverified | 0 | 0 |
| Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study | May 29, 2024 | Answer GenerationHallucination | —Unverified | 0 | 0 |
| Uncertainty Estimation of Large Language Models in Medical Question Answering | Jul 11, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Unifying Corroborative and Contributive Attributions in Large Language Models | Nov 20, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | Apr 22, 2024 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | May 15, 2025 | AllBenchmarking | —Unverified | 0 | 0 |
| What Would it Take to get Biomedical QA Systems into Practice? | Sep 21, 2021 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Feb 22, 2024 | FormMedical Question Answering | —Unverified | 0 | 0 |
| 70B-parameter large language models in Japanese medical question-answering | Jun 21, 2024 | Continual PretrainingDomain Adaptation | —Unverified | 0 | 0 |
| A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis | Jan 27, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Feb 18, 2025 | Graph GenerationKnowledge Graphs | —Unverified | 0 | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| A Grounded Well-being Conversational Agent with Multiple Interaction Modes: Preliminary Results | Nov 28, 2021 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Sep 27, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| An Empirical Evaluation of Large Language Models on Consumer Health Questions | Dec 31, 2024 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge | Aug 1, 2019 | Medical Question AnsweringNatural Language Inference | —Unverified | 0 | 0 |
| ARS\_NITK at MEDIQA 2019:Analysing Various Methods for Natural Language Inference, Recognising Question Entailment and Medical Question Answering System | Aug 1, 2019 | Information RetrievalMedical Question Answering | —Unverified | 0 | 0 |
| A Survey for Large Language Models in Biomedicine | Aug 29, 2024 | DiagnosticDrug Discovery | —Unverified | 0 | 0 |
| Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI | May 11, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding | Apr 30, 2025 | Medical Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Challenges of GPT-3-based Conversational Agents for Healthcare | Aug 28, 2023 | Medical Question AnsweringMedQA | —Unverified | 0 | 0 |
| ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases | May 30, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Collaboration among Multiple Large Language Models for Medical Question Answering | May 22, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | Nov 14, 2024 | Medical Question AnsweringMisinformation | —Unverified | 0 | 0 |