| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval | May 26, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining | Apr 25, 2023 | ArticlesImage-text Retrieval | —Unverified | 0 |
| RECLIP: Resource-efficient CLIP by Training with Small Images | Apr 12, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval | Apr 6, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 |
| Scene Graph Based Fusion Network For Image-Text Retrieval | Mar 20, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening | Mar 14, 2023 | Image-text RetrievalMulti-Label Classification | —Unverified | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| Semantic-Preserving Augmentation for Robust Image-Text Retrieval | Mar 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |