| ALIP: Adaptive Language-Image Pre-training with Synthetic Caption | Aug 16, 2023 | Action ClassificationImage-text Retrieval | CodeCode Available | 1 | 5 |
| Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations | Jun 14, 2023 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 | 5 |
| ESA: External Space Attention Aggregation for Image-Text Retrieval | Oct 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 | 5 |
| FILIP: Fine-grained Interactive Language-Image Pre-Training | Nov 9, 2021 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| Align before Fuse: Vision and Language Representation Learning with Momentum Distillation | Jul 16, 2021 | Cross-Modal RetrievalGrounded language learning | CodeCode Available | 1 | 5 |
| Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning | Mar 19, 2024 | Diagnosticimage-classification | CodeCode Available | 1 | 5 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 | 5 |
| A Survey of Medical Vision-and-Language Applications and Their Techniques | Nov 19, 2024 | Decision MakingDiagnostic | CodeCode Available | 1 | 5 |
| Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training | Jun 15, 2023 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 | 5 |