| Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift | Dec 15, 2022 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Hyperbolic Image-Text Representations | Apr 18, 2023 | image-classificationImage Classification | CodeCode Available | 1 |
| Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training | Jun 15, 2023 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning | Aug 14, 2023 | Contrastive LearningGenerative Adversarial Network | CodeCode Available | 1 |
| ESA: External Space Attention Aggregation for Image-Text Retrieval | Oct 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 |
| Large-Scale Adversarial Training for Vision-and-Language Representation Learning | Jun 11, 2020 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner | May 19, 2023 | Dense CaptioningImage Captioning | CodeCode Available | 1 |
| FlexiViT: One Model for All Patch Sizes | Dec 15, 2022 | AllImage-text Retrieval | CodeCode Available | 1 |
| FILIP: Fine-grained Interactive Language-Image Pre-Training | Nov 9, 2021 | image-classificationImage Classification | CodeCode Available | 1 |