| Equivariant Similarity for Vision-Language Foundation Models | Mar 25, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 | 5 |
| FILIP: Fine-grained Interactive Language-Image Pre-Training | Nov 9, 2021 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval | Feb 6, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval | Oct 11, 2019 | Graph MatchingImage-text Retrieval | CodeCode Available | 1 | 5 |
| Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training | Jun 15, 2023 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 | 5 |
| A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval | Jun 4, 2021 | Graph MatchingImage Retrieval | CodeCode Available | 1 | 5 |
| AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning | Aug 14, 2023 | Contrastive LearningGenerative Adversarial Network | CodeCode Available | 1 | 5 |
| CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation | Jul 1, 2024 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 | 5 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 | 5 |
| Composing Object Relations and Attributes for Image-Text Matching | Jun 17, 2024 | AttributeGraph Attention | CodeCode Available | 1 | 5 |