ViLMedic: a framework for research at the intersection of vision and language in medical AI

2022-05-01ACL 2022Code Available0· sign in to hype

Jean-Benoit Delbrouck, Khaled Saab, Maya Varma, Sabri Eyuboglu, Pierre Chambon, Jared Dunnmon, Juan Zambrano, Akshay Chaudhari, Curtis Langlotz

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/jbdel/vilmedic
OfficialIn paper★ 187

Abstract

There is a growing need to model interactions between data modalities (e.g., vision, language) — both to improve AI predictions on existing tasks and to enable new applications. In the recent field of multimodal medical AI, integrating multiple modalities has gained widespread popularity as multimodal models have proven to improve performance, robustness, require less training samples and add complementary information. To improve technical reproducibility and transparency for multimodal medical tasks as well as speed up progress across medical AI, we present ViLMedic, a Vision-and-Language medical library. As of 2022, the library contains a dozen reference implementations replicating the state-of-the-art results for problems that range from medical visual question answering and radiology report generation to multimodal representation learning on widely adopted medical datasets. In addition, ViLMedic hosts a model-zoo with more than twenty pretrained models for the above tasks designed to be extensible by researchers but also simple for practitioners. Ultimately, we hope our reproducible pipelines can enable clinical translation and create real impact.The library is available at https://github.com/jbdel/vilmedic.

Tasks

Medical Visual Question Answering Question Answering Representation Learning Visual Question Answering Visual Question Answering (VQA)

ViLMedic: a framework for research at the intersection of vision and language in medical AI

Code

Abstract

Tasks

Reproductions