A Replication Study of Dense Passage Retriever Apr 12, 2021 Open-Domain Question Answering Question Answering
Code Code Available 25 Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations Apr 29, 2024 Retrieval Text Retrieval
Code Code Available 25 Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing Dec 21, 2022 Contrastive Learning Drug Design
Code Code Available 25 CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment Sep 14, 2022 Retrieval Text Retrieval
Code Code Available 25 MedCLIP: Contrastive Learning from Unpaired Medical Images and Text Oct 18, 2022 Contrastive Learning Image-text Retrieval
Code Code Available 25 M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval Jan 31, 2024 Retrieval Text Retrieval
Code Code Available 25 GLAP: General contrastive audio-text pretraining across domains and languages Jun 12, 2025 AudioCaps Keyword Spotting
Code Code Available 25 AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 25 Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 25 One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory May 29, 2025 Contrastive Learning Text Retrieval
Code Code Available 25 Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 15 Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 15 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 15 A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports Sep 3, 2020 Image-text Retrieval Medical Visual Question Answering
Code Code Available 15 Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 15 Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 15 COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 15 A Survey of Medical Vision-and-Language Applications and Their Techniques Nov 19, 2024 Decision Making Diagnostic
Code Code Available 15 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 Dynamic Modality Interaction Modeling for Image-Text Retrieval Jul 11, 2021 cross-modal alignment Cross-Modal Retrieval
Code Code Available 15 FlexiViT: One Model for All Patch Sizes Dec 15, 2022 All Image-text Retrieval
Code Code Available 15 Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 15 mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 15 Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Mar 1, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 15 Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 15 DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 15 AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 15 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 15 Fine-Tuning LLaMA for Multi-Stage Text Retrieval Oct 12, 2023 Passage Retrieval Retrieval
Code Code Available 15 Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 15 Dense Hierarchical Retrieval for Open-Domain Question Answering Oct 28, 2021 Open-Domain Question Answering Question Answering
Code Code Available 15 Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval Jul 1, 2020 Contrastive Learning Passage Retrieval
Code Code Available 15 FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 15 A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 15 DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases Sep 30, 2022 Entity Linking Question Answering
Code Code Available 15 Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 15 Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 15 Data-Efficient Multimodal Fusion on a Single GPU Dec 15, 2023 GPU Image Retrieval
Code Code Available 15 Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 15 Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 15 A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval Oct 27, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 15 A Dense Representation Framework for Lexical and Semantic Matching Jun 20, 2022 Retrieval Semantic Text Matching
Code Code Available 15 Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 15 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 15 ArabicaQA: A Comprehensive Dataset for Arabic Question Answering Mar 26, 2024 Benchmarking Machine Reading Comprehension
Code Code Available 15 Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Oct 11, 2019 Graph Matching Image-text Retrieval
Code Code Available 15 CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval Dec 17, 2024 Contrastive Learning Information Retrieval
Code Code Available 15 A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 15