MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval Jan 19, 2023 Retrieval Text Retrieval
Code Code Available 1Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 1Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 1LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval Jan 1, 2023 image-classification Image Classification
Code Code Available 1Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 1FlexiViT: One Model for All Patch Sizes Dec 15, 2022 All Image-text Retrieval
Code Code Available 1DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 1ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 1Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning Nov 24, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning Oct 27, 2022 Language Modeling Language Modelling
Code Code Available 1VTC: Improving Video-Text Retrieval with User Comments Oct 19, 2022 Representation Learning Retrieval
Code Code Available 1Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA Oct 11, 2022 Open-Domain Question Answering Question Answering
Code Code Available 1MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1Nonparametric Decoding for Generative Retrieval Oct 5, 2022 Decoder Language Modelling
Code Code Available 1SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Oct 3, 2022 Language Modeling Language Modelling
Code Code Available 1DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases Sep 30, 2022 Entity Linking Question Answering
Code Code Available 1Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 1Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text Sep 28, 2022 Image Captioning Image Retrieval
Code Code Available 1FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 1Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval Sep 1, 2022 Image Retrieval Open-Domain Question Answering
Code Code Available 1Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Contrastive Audio-Language Learning for Music Aug 25, 2022 Audio to Text Retrieval Descriptive
Code Code Available 1X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval Jul 15, 2022 Contrastive Learning Retrieval
Code Code Available 1A Dense Representation Framework for Lexical and Semantic Matching Jun 20, 2022 Retrieval Semantic Text Matching
Code Code Available 1MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 1Fast and Light-Weight Answer Text Retrieval in Dialogue Systems May 27, 2022 Re-Ranking Retrieval
Code Code Available 1mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking May 21, 2022 Passage Ranking Passage Re-Ranking
Code Code Available 1CCMB: A Large-scale Chinese Cross-modal Benchmark May 8, 2022 image-classification Image Classification
Code Code Available 1Cross-modal Contrastive Learning for Speech Translation May 5, 2022 Contrastive Learning Retrieval
Code Code Available 1Generative Multi-hop Retrieval Apr 27, 2022 Decoder GPU
Code Code Available 1MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval Apr 26, 2022 Action Recognition Retrieval
Code Code Available 1MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1On Metric Learning for Audio-Text Cross-Modal Retrieval Mar 29, 2022 AudioCaps Cross-Modal Retrieval
Code Code Available 1LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval Mar 11, 2022 Contrastive Learning Re-Ranking
Code Code Available 1Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval Mar 8, 2022 Image-text Retrieval Information Retrieval
Code Code Available 1Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 1Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 1CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1Densifying Sparse Representations for Passage Retrieval by Representational Slicing Dec 9, 2021 Passage Retrieval Retrieval
Code Code Available 1Video-Text Pre-training with Learned Regions Dec 2, 2021 Representation Learning Retrieval
Code Code Available 1FILIP: Fine-grained Interactive Language-Image Pre-Training Nov 9, 2021 image-classification Image Classification
Code Code Available 1VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Nov 3, 2021 Image Retrieval Image-text Retrieval
Code Code Available 1Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder Nov 1, 2021 Decoder Language Modeling
Code Code Available 1Dense Hierarchical Retrieval for Open-Domain Question Answering Oct 28, 2021 Open-Domain Question Answering Question Answering
Code Code Available 1Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss Sep 9, 2021 Mixture-of-Experts Retrieval
Code Code Available 1HANet: Hierarchical Alignment Networks for Video-Text Retrieval Jul 26, 2021 Retrieval Text Matching
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1