| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 | 5 |
| BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | Jan 30, 2023 | Generative Visual Question AnsweringImage Captioning | CodeCode Available | 4 | 5 |
| OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM | Feb 14, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 4 | 5 |
| MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine | Aug 6, 2024 | Medical Visual Question AnsweringOrgan Detection | CodeCode Available | 3 | 5 |
| BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Dec 10, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks | May 26, 2023 | Image CaptioningMedical Visual Question Answering | CodeCode Available | 2 | 5 |
| MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | Mar 22, 2024 | Medical DiagnosisMedical Visual Question Answering | CodeCode Available | 2 | 5 |
| PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents | Mar 13, 2023 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Med-Flamingo: a Multimodal Medical Few-shot Learner | Jul 27, 2023 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 2 | 5 |
| PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging | Jan 5, 2024 | Medical Report GenerationMedical Visual Question Answering | CodeCode Available | 2 | 5 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering | Jul 11, 2023 | Language ModelingMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| Localized Questions in Medical Visual Question Answering | Jul 3, 2023 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration | Oct 6, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks | May 18, 2025 | BenchmarkingMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| MISS: A Generative Pretraining and Finetuning Approach for Med-VQA | Jan 10, 2024 | Medical Visual Question AnsweringMulti-Task Learning | CodeCode Available | 1 | 5 |
| Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations | Feb 10, 2024 | DiagnosticHallucination | CodeCode Available | 1 | 5 |
| MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models | Sep 23, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training | May 24, 2021 | Image CaptioningMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Apr 19, 2024 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| A Survey of Medical Vision-and-Language Applications and Their Techniques | Nov 19, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 | 5 |
| MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts | May 18, 2023 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs | Mar 2, 2023 | ArticlesMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images | Oct 28, 2023 | Decision MakingMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| MedCoT: Medical Chain of Thought via Hierarchical Expert | Dec 18, 2024 | DiagnosticMedical Visual Question Answering | CodeCode Available | 1 | 5 |