| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Revising Image-Text Retrieval via Multi-Modal Entailment | Aug 22, 2022 | Image-text RetrievalNatural Language Inference | —Unverified | 0 |
| CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval | Aug 21, 2022 | ClusteringContrastive Learning | —Unverified | 0 |
| VLMAE: Vision-Language Masked Autoencoder | Aug 19, 2022 | Image-text RetrievalLanguage Modeling | —Unverified | 0 |
| Intra-Modal Constraint Loss For Image-Text Retrieval | Jul 11, 2022 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 |
| Dynamic Contrastive Distillation for Image-Text Retrieval | Jul 4, 2022 | Contrastive LearningGPU | —Unverified | 0 |
| MixGen: A New Multi-Modal Data Augmentation | Jun 16, 2022 | Data AugmentationImage-text Retrieval | CodeCode Available | 1 |
| Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone | Jun 15, 2022 | Described Object DetectionImage Captioning | CodeCode Available | 1 |
| VL-BEiT: Generative Vision-Language Pretraining | Jun 2, 2022 | image-classificationImage Classification | —Unverified | 0 |
| Cross-lingual and Multilingual CLIP | Jun 1, 2022 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training | Jun 1, 2022 | Contrastive LearningCross-Lingual Transfer | CodeCode Available | 1 |
| Prompt-based Learning for Unpaired Image Captioning | May 26, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset | May 25, 2022 | Image CaptioningImage Retrieval | —Unverified | 0 |
| HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval | May 24, 2022 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| CCMB: A Large-scale Chinese Cross-modal Benchmark | May 8, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Progressive Learning for Image Retrieval with Hybrid-Modality Queries | Apr 24, 2022 | Image RetrievalImage-text Retrieval | —Unverified | 0 |
| COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval | Apr 15, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| Robust Cross-Modal Representation Learning with Progressive Self-Distillation | Apr 10, 2022 | Contrastive LearningImage Captioning | —Unverified | 0 |
| Image-text Retrieval: A Survey on Recent Research and Development | Mar 28, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |
| LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval | Mar 10, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval | Mar 8, 2022 | Image-text RetrievalInformation Retrieval | CodeCode Available | 1 |
| An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing | Feb 26, 2022 | Image-text RetrievalMeta-Learning | CodeCode Available | 0 |
| Vision-Language Pre-Training with Triple Contrastive Learning | Feb 21, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 2 |