| Masked Contrastive Pre-Training for Efficient Video-Text Retrieval | Dec 2, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |
| MASS: Overcoming Language Bias in Image-Text Matching | Jan 20, 2025 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval | Jun 26, 2025 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval | Oct 30, 2023 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 | 0 |
| Multilateral Semantic Relations Modeling for Image Text Retrieval | Jan 1, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |
| Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval | Dec 17, 2021 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |
| NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training | Sep 15, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |
| NLIP: Noise-robust Language-Image Pre-training | Dec 14, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training | Jun 25, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Progressive Learning for Image Retrieval with Hybrid-Modality Queries | Apr 24, 2022 | Image RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| Progressive Local Alignment for Medical Multimodal Pre-training | Feb 25, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Prompt-based Learning for Unpaired Image Captioning | May 26, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations | Sep 11, 2024 | Image-text RetrievalText Retrieval | —Unverified | 0 | 0 |
| RECLIP: Resource-efficient CLIP by Training with Small Images | Apr 12, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Re-Imagen: Retrieval-Augmented Text-to-Image Generator | Sep 29, 2022 | Image GenerationImage-text Retrieval | —Unverified | 0 | 0 |
| Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval | May 22, 2025 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 | 0 |
| Revising Image-Text Retrieval via Multi-Modal Entailment | Aug 22, 2022 | Image-text RetrievalNatural Language Inference | —Unverified | 0 | 0 |
| Robust Cross-Modal Representation Learning with Progressive Self-Distillation | Apr 10, 2022 | Contrastive LearningImage Captioning | —Unverified | 0 | 0 |
| RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | Oct 23, 2022 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing | Dec 12, 2022 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| Scene Graph Based Fusion Network For Image-Text Retrieval | Mar 20, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |
| Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement | Apr 6, 2024 | Image-text Retrievalobject-detection | —Unverified | 0 | 0 |
| SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI | Mar 25, 2025 | Contrastive LearningImage Segmentation | —Unverified | 0 | 0 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | —Unverified | 0 | 0 |