| Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction | Mar 16, 2024 | Adversarial RobustnessImage-text Retrieval | —Unverified | 0 |
| Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples | Mar 5, 2024 | Concept AlignmentContrastive Learning | —Unverified | 0 |
| Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning | Jan 30, 2024 | DiversityImage-text Retrieval | CodeCode Available | 0 |
| Enhancing Image-Text Matching with Adaptive Feature Aggregation | Jan 18, 2024 | Image-text matchingImage-text Retrieval | CodeCode Available | 0 |
| SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Jan 4, 2024 | Image Captioningimage-classification | —Unverified | 0 |
| Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data | Dec 11, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models | Dec 1, 2023 | image-classificationImage Classification | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| A New Fine-grained Alignment Method for Image-text Matching | Nov 3, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval | Oct 30, 2023 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 |
| Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval | Oct 12, 2023 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Constructing Image-Text Pair Dataset from Books | Oct 3, 2023 | Image-text RetrievalOptical Character Recognition (OCR) | —Unverified | 0 |
| Dual Relation Alignment for Composed Image Retrieval | Sep 5, 2023 | Image RetrievalImage-text Retrieval | —Unverified | 0 |
| MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval | Sep 4, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Contrastive Feature Masking Open-Vocabulary Vision Transformer | Sep 2, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| DLIP: Distilling Language-Image Pre-training | Aug 24, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks | Aug 13, 2023 | Contrastive Learningimage-classification | —Unverified | 0 |
| Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP | Jul 18, 2023 | AttributeImage-text Retrieval | —Unverified | 0 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval | May 26, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining | Apr 25, 2023 | ArticlesImage-text Retrieval | —Unverified | 0 |
| RECLIP: Resource-efficient CLIP by Training with Small Images | Apr 12, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval | Apr 6, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 |
| Scene Graph Based Fusion Network For Image-Text Retrieval | Mar 20, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening | Mar 14, 2023 | Image-text RetrievalMulti-Label Classification | —Unverified | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| Semantic-Preserving Augmentation for Robust Image-Text Retrieval | Mar 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| The style transformer with common knowledge optimization for image-text retrieval | Mar 1, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 |
| USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval | Jan 17, 2023 | Contrastive LearningImage-text Retrieval | CodeCode Available | 0 |
| HADA: A Graph-based Amalgamation Framework in Image-text Retrieval | Jan 11, 2023 | Graph Neural NetworkImage Retrieval | CodeCode Available | 0 |
| NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross-Modal Embeddings | Jan 7, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 |
| VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching | Jan 1, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Multilateral Semantic Relations Modeling for Image Text Retrieval | Jan 1, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 |
| GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks | Jan 1, 2023 | Image GenerationImage-text Retrieval | —Unverified | 0 |
| ViLEM: Visual-Language Error Modeling for Image-Text Retrieval | Jan 1, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| Efficient Image Captioning for Edge Devices | Dec 18, 2022 | CPUImage Captioning | —Unverified | 0 |
| HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval | Dec 16, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| NLIP: Noise-robust Language-Image Pre-training | Dec 14, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing | Dec 12, 2022 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| Masked Contrastive Pre-Training for Efficient Video-Text Retrieval | Dec 2, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 |
| Generative Negative Text Replay for Continual Vision-Language Pretraining | Oct 31, 2022 | Continual Learningimage-classification | —Unverified | 0 |
| RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | Oct 23, 2022 | Image CaptioningImage-text Retrieval | CodeCode Available | 0 |
| Dissecting Deep Metric Learning Losses for Image-Text Retrieval | Oct 21, 2022 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |
| Image-Text Retrieval with Binary and Continuous Label Supervision | Oct 20, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 |
| MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning | Oct 9, 2022 | Image-text Retrievalmultimodal interaction | —Unverified | 0 |