CLIP2Video: Mastering Video-Text Retrieval via Image CLIP Jun 21, 2021 Language Modeling Language Modelling
Code Code Available 15 CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Apr 18, 2021 Retrieval Text Retrieval
Code Code Available 15 A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 15 Extending Multi-modal Contrastive Representations Oct 13, 2023 3D Object Classification Representation Learning
Code Code Available 15 Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Mar 19, 2024 Diagnostic image-classification
Code Code Available 15 Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 15 Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits Feb 12, 2021 CPU Document Ranking
Code Code Available 15 Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration May 26, 2024 Information Retrieval Retrieval
Code Code Available 15 LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models Sep 2, 2023 Blocking Language Modelling
Code Code Available 15 COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning Oct 27, 2022 Language Modeling Language Modelling
Code Code Available 15 Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 15 Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder Nov 1, 2021 Decoder Language Modeling
Code Code Available 15 ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 15 COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 15 LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval Feb 6, 2023 Image-text Retrieval Retrieval
Code Code Available 15 Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 15 GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition Jan 1, 2021 Image-text Retrieval Medical Image Analysis
Code Code Available 15 Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 15 ESA: External Space Attention Aggregation for Image-Text Retrieval Oct 10, 2023 Image-text Retrieval Retrieval
Code Code Available 15 Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 15 A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 15 Learning to Rank in Generative Retrieval Jun 27, 2023 Learning-To-Rank Passage Ranking
Code Code Available 15 Nonparametric Decoding for Generative Retrieval Oct 5, 2022 Decoder Language Modelling
Code Code Available 15 Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 15 Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding Jun 15, 2023 Contrastive Learning image-classification
Code Code Available 15 Contrastive Audio-Language Learning for Music Aug 25, 2022 Audio to Text Retrieval Descriptive
Code Code Available 15 LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval Jan 1, 2023 image-classification Image Classification
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 Equivariant Similarity for Vision-Language Foundation Models Mar 25, 2023 Image-text Retrieval Retrieval
Code Code Available 15 Learning the Best Pooling Strategy for Visual Semantic Embedding Nov 9, 2020 Cross-Modal Information Retrieval Image-text Retrieval
Code Code Available 15 LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval Mar 16, 2021 Image-text Retrieval Re-Ranking
Code Code Available 15 LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration Feb 18, 2024 Multi-hop Question Answering Question Answering
Code Code Available 15 Large-Scale Adversarial Training for Vision-and-Language Representation Learning Jun 11, 2020 Image-text Retrieval Question Answering
Code Code Available 15 LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval Mar 11, 2022 Contrastive Learning Re-Ranking
Code Code Available 15 LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space May 28, 2024 Contrastive Learning Decoder
Code Code Available 15 A Data-Centric Framework for Composable NLP Workflows Mar 2, 2021 Retrieval Text Retrieval
Code Code Available 15 Learnable Pillar-based Re-ranking for Image-Text Retrieval Apr 25, 2023 Image-text Retrieval Re-Ranking
Code Code Available 15 Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 15 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 15 Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 15 Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 15 Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 15 Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Oct 11, 2019 Graph Matching Image-text Retrieval
Code Code Available 15 Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Apr 7, 2018 Retrieval Text Retrieval
Code Code Available 15 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 15 Cross-Modal Retrieval with Partially Mismatched Pairs Feb 22, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 15 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 15