| Large-Scale Adversarial Training for Vision-and-Language Representation Learning | Jun 11, 2020 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 | 5 |
| Learnable Pillar-based Re-ranking for Image-Text Retrieval | Apr 25, 2023 | Image-text RetrievalRe-Ranking | CodeCode Available | 1 | 5 |
| Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone | Jun 15, 2022 | Described Object DetectionImage Captioning | CodeCode Available | 1 | 5 |
| LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval | Feb 6, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning | Mar 19, 2024 | Diagnosticimage-classification | CodeCode Available | 1 | 5 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 | 5 |
| FlexiViT: One Model for All Patch Sizes | Dec 15, 2022 | AllImage-text Retrieval | CodeCode Available | 1 | 5 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping | Apr 26, 2023 | DecoderImage Captioning | CodeCode Available | 1 | 5 |
| GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition | Jan 1, 2021 | Image-text RetrievalMedical Image Analysis | CodeCode Available | 1 | 5 |