| CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval | Feb 15, 2022 | Image-text RetrievalRepresentation Learning | —Unverified | 0 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | Jan 28, 2022 | Image CaptioningImage-text matching | CodeCode Available | 5 |
| Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval | Dec 17, 2021 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation | Dec 10, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| UFO: A UniFied TransfOrmer for Vision-Language Representation Learning | Nov 19, 2021 | Image CaptioningImage-text matching | —Unverified | 0 |
| Constructing Phrase-level Semantic Labels to Form Multi-GrainedSupervision for Image-Text Retrieval | Nov 16, 2021 | FormImage-text Retrieval | —Unverified | 0 |
| SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval | Nov 10, 2021 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| FILIP: Fine-grained Interactive Language-Image Pre-Training | Nov 9, 2021 | image-classificationImage Classification | CodeCode Available | 1 |
| Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval | Nov 5, 2021 | Image-text RetrievalRetrieval | CodeCode Available | 0 |