| M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining | Jan 29, 2024 | GPUzero-shot-classification | CodeCode Available | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 |
| ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training | Sep 30, 2022 | Computational EfficiencyContrastive Learning | CodeCode Available | 0 |
| Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset | May 25, 2022 | Image CaptioningImage Retrieval | —Unverified | 0 |
| ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions | Jul 23, 2020 | Cross-Modal Information RetrievalImage Retrieval | CodeCode Available | 0 |