mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 15 mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 15 GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition Jan 1, 2021 Image-text Retrieval Medical Image Analysis
Code Code Available 15 Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 15 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Apr 18, 2021 Retrieval Text Retrieval
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Jun 14, 2023 image-classification Image Classification
Code Code Available 15 Equivariant Similarity for Vision-Language Foundation Models Mar 25, 2023 Image-text Retrieval Retrieval
Code Code Available 15 ESA: External Space Attention Aggregation for Image-Text Retrieval Oct 10, 2023 Image-text Retrieval Retrieval
Code Code Available 15 CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 15 Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 15 GOAL: Global-local Object Alignment Learning Mar 22, 2025 Descriptive Object
Code Code Available 15 ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Sep 13, 2024 Audio Classification Descriptive
Code Code Available 15 Vision-Language Dataset Distillation Aug 15, 2023 Dataset Distillation image-classification
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 15 Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration May 26, 2024 Information Retrieval Retrieval
Code Code Available 15 Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark Jun 10, 2023 Image-text Retrieval Medical Report Generation
Code Code Available 15 Generative Multi-hop Retrieval Apr 27, 2022 Decoder GPU
Code Code Available 15 ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Aug 16, 2023 Action Classification Image-text Retrieval
Code Code Available 15 CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback Jun 19, 2021 Image Retrieval Image-text Retrieval
Code Code Available 15 COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning Oct 27, 2022 Language Modeling Language Modelling
Code Code Available 15 GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search Dec 30, 2024 RAG Retrieval
Code Code Available 15 ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval Dec 19, 2023 Few-Shot Learning Retrieval
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 15 Language-agnostic BERT Sentence Embedding Jul 3, 2020 Language Modeling Language Modelling
Code Code Available 15 ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 15 FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 15 GLEN: Generative Retrieval via Lexical Index Learning Nov 6, 2023 Learning-To-Rank Retrieval
Code Code Available 15 Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA Oct 11, 2022 Open-Domain Question Answering Question Answering
Code Code Available 15 Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 15 Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 15 Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 15 Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Mar 1, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 15 Fine-Tuning LLaMA for Multi-Stage Text Retrieval Oct 12, 2023 Passage Retrieval Retrieval
Code Code Available 15 Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 15 Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers May 11, 2023 Contrastive Learning Image-text Retrieval
Code Code Available 15 From Unimodal to Multimodal: Scaling up Projectors to Align Modalities Sep 28, 2024 Image-text Retrieval Semantic Similarity
Code Code Available 05 Attacking Attention of Foundation Models Disrupts Downstream Tasks Jun 3, 2025 Depth Estimation Image-text Retrieval
Code Code Available 05 PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models Dec 5, 2023 Retrieval Text Retrieval
Code Code Available 05 Partial Scene Text Retrieval Nov 15, 2024 Multiple Instance Learning Retrieval
Code Code Available 05 Pre-trained Language Models Can be Fully Zero-Shot Learners Dec 14, 2022 Retrieval text-classification
Code Code Available 05 On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization Oct 12, 2023 Information Retrieval Retrieval
Code Code Available 05 ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors Feb 20, 2025 AudioCaps Contrastive Learning
Code Code Available 05 OTE: Exploring Accurate Scene Text Recognition Using One Token Jan 1, 2024 Decoder Scene Text Recognition
Code Code Available 05 Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images Nov 23, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 05 FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis Jul 29, 2024 Image-text Retrieval Model Selection
Code Code Available 05 PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts May 24, 2023 Dialogue State Tracking Image Retrieval
Code Code Available 05 Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval Jul 17, 2024 Image-text Retrieval Object
Code Code Available 05