| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 |
| UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training | Apr 1, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval | Mar 16, 2021 | Image-text RetrievalRe-Ranking | CodeCode Available | 1 |
| WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning | Mar 2, 2021 | BIG-bench Machine LearningImage Retrieval | CodeCode Available | 2 |
| Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | Feb 11, 2021 | Cross-Modal RetrievalFine-Grained Image Classification | CodeCode Available | 2 |
| GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition | Jan 1, 2021 | Image-text RetrievalMedical Image Analysis | CodeCode Available | 1 |
| Learning the Best Pooling Strategy for Visual Semantic Embedding | Nov 9, 2020 | Cross-Modal Information RetrievalImage-text Retrieval | CodeCode Available | 1 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 |
| Graph Optimal Transport for Cross-Domain Alignment | Jun 26, 2020 | Graph MatchingImage Captioning | CodeCode Available | 1 |