| FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis | Jul 29, 2024 | Image-text RetrievalModel Selection | CodeCode Available | 0 | 5 |
| A Vision-Language Foundation Model for Leaf Disease Identification | May 11, 2025 | Contrastive Learningimage-classification | CodeCode Available | 0 | 5 |
| HADA: A Graph-based Amalgamation Framework in Image-text Retrieval | Jan 11, 2023 | Graph Neural NetworkImage Retrieval | CodeCode Available | 0 | 5 |
| USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval | Jan 17, 2023 | Contrastive LearningImage-text Retrieval | CodeCode Available | 0 | 5 |
| Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval | Dec 26, 2024 | Image-text RetrievalInformation Retrieval | CodeCode Available | 0 | 5 |
| Adding simple structure at inference improves Vision-Language Compositionality | Jun 11, 2025 | AttributeImage-text Retrieval | CodeCode Available | 0 | 5 |
| Semantic-Preserving Augmentation for Robust Image-Text Retrieval | Mar 10, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval | Nov 5, 2021 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | CodeCode Available | 0 | 5 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 | 5 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 | 5 |
| NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross-Modal Embeddings | Jan 7, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 | 5 |
| MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval | Sep 4, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| Intra-Modal Constraint Loss For Image-Text Retrieval | Jul 11, 2022 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 | 5 |
| Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval | Apr 6, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 0 | 5 |
| Attacking Attention of Foundation Models Disrupts Downstream Tasks | Jun 3, 2025 | Depth EstimationImage-text Retrieval | CodeCode Available | 0 | 5 |
| The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision | Apr 26, 2019 | Image-text RetrievalObject | CodeCode Available | 0 | 5 |
| From Unimodal to Multimodal: Scaling up Projectors to Align Modalities | Sep 28, 2024 | Image-text RetrievalSemantic Similarity | CodeCode Available | 0 | 5 |
| Multi-stage Pre-training over Simplified Multimodal Pre-training Models | Jul 22, 2021 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval | Jul 17, 2024 | Image-text RetrievalObject | CodeCode Available | 0 | 5 |
| Dissecting Deep Metric Learning Losses for Image-Text Retrieval | Oct 21, 2022 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 | 5 |
| Enhancing Image-Text Matching with Adaptive Feature Aggregation | Jan 18, 2024 | Image-text matchingImage-text Retrieval | CodeCode Available | 0 | 5 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 | 5 |
| An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing | Feb 26, 2022 | Image-text RetrievalMeta-Learning | CodeCode Available | 0 | 5 |
| VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models | Sep 12, 2022 | AttributeImage-text Retrieval | CodeCode Available | 0 | 5 |
| Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning | Jun 26, 2024 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 0 | 5 |
| Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval | Jun 11, 2018 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning | Jan 30, 2024 | DiversityImage-text Retrieval | CodeCode Available | 0 | 5 |
| Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval | May 26, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 | 5 |
| Multilingual Vision-Language Pre-training for the Remote Sensing Domain | Oct 30, 2024 | Cross-Modal Retrievalimage-classification | CodeCode Available | 0 | 5 |
| GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning | Oct 20, 2024 | Image RetrievalImage-text Retrieval | CodeCode Available | 0 | 5 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 | 5 |
| Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Oct 12, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations | Mar 2, 2025 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples | May 25, 2024 | Active LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training | Jan 1, 2025 | Image-text RetrievalImage to text | —Unverified | 0 | 0 |
| AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection | Apr 28, 2025 | Adversarial AttackAnomaly Detection | —Unverified | 0 | 0 |
| Anatomy-Aware Conditional Image-Text Retrieval | Mar 10, 2025 | AnatomyContrastive Learning | —Unverified | 0 | 0 |
| AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models | Oct 7, 2024 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |
| Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment | Nov 30, 2024 | Image-text RetrievalRepresentation Learning | —Unverified | 0 | 0 |
| Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective | Jul 21, 2024 | Image-text RetrievalInformation Retrieval | —Unverified | 0 | 0 |
| Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Image & Text Retrieval | Nov 19, 2015 | Image-text RetrievalModel Selection | —Unverified | 0 | 0 |
| Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP | Jul 18, 2023 | AttributeImage-text Retrieval | —Unverified | 0 | 0 |
| Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses | Dec 11, 2024 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval | Jun 9, 2024 | Image-text RetrievalPerson Retrieval | —Unverified | 0 | 0 |
| Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models | May 20, 2025 | Image-text RetrievalText Retrieval | —Unverified | 0 | 0 |
| Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs | Apr 24, 2025 | Image-text RetrievalInstruction Following | —Unverified | 0 | 0 |
| CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval | Aug 21, 2022 | ClusteringContrastive Learning | —Unverified | 0 | 0 |
| CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval | Feb 15, 2022 | Image-text RetrievalRepresentation Learning | —Unverified | 0 | 0 |
| Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data | Dec 11, 2023 | Image CaptioningImage-text Retrieval | —Unverified | 0 | 0 |