Data-Efficient Multimodal Fusion on a Single GPU Dec 15, 2023 GPU Image Retrieval
Code Code Available 1Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 1CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 1Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 1Hyperbolic Image-Text Representations Apr 18, 2023 image-classification Image Classification
Code Code Available 1I0T: Embedding Standardization Method Towards Zero Modality Gap Dec 18, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 1Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss Sep 9, 2021 Mixture-of-Experts Retrieval
Code Code Available 1IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval Mar 8, 2020 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 1HANet: Hierarchical Alignment Networks for Video-Text Retrieval Jul 26, 2021 Retrieval Text Matching
Code Code Available 1Equivariant Similarity for Vision-Language Foundation Models Mar 25, 2023 Image-text Retrieval Retrieval
Code Code Available 1Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 1GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition Jan 1, 2021 Image-text Retrieval Medical Image Analysis
Code Code Available 1Densifying Sparse Representations for Passage Retrieval by Representational Slicing Dec 9, 2021 Passage Retrieval Retrieval
Code Code Available 1LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space May 28, 2024 Contrastive Learning Decoder
Code Code Available 1DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 1Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Oct 11, 2019 Graph Matching Image-text Retrieval
Code Code Available 1Learning the Best Pooling Strategy for Visual Semantic Embedding Nov 9, 2020 Cross-Modal Information Retrieval Image-text Retrieval
Code Code Available 1Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder Nov 1, 2021 Decoder Language Modeling
Code Code Available 1DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 1GOAL: Global-local Object Alignment Learning Mar 22, 2025 Descriptive Object
Code Code Available 1LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models Sep 2, 2023 Blocking Language Modelling
Code Code Available 1Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 1Helping Hands: An Object-Aware Ego-Centric Video Recognition Model Aug 15, 2023 Decoder Object
Code Code Available 1Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering Nov 10, 2019 Natural Questions Open-Domain Question Answering
Code Code Available 1Cross-Modal Retrieval with Partially Mismatched Pairs Feb 22, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 1Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 1Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 1Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization Dec 30, 2023 Answer Generation Contrastive Learning
Code Code Available 1Generative Multi-hop Retrieval Apr 27, 2022 Decoder GPU
Code Code Available 1A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 1MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 1GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search Dec 30, 2024 RAG Retrieval
Code Code Available 1Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 1mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Cross-modal Contrastive Learning for Speech Translation May 5, 2022 Contrastive Learning Retrieval
Code Code Available 1MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 1Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Jun 14, 2023 image-classification Image Classification
Code Code Available 1AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 1Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 1Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 1GLEN: Generative Retrieval via Lexical Index Learning Nov 6, 2023 Learning-To-Rank Retrieval
Code Code Available 1CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark Jun 10, 2023 Image-text Retrieval Medical Report Generation
Code Code Available 1MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval Jan 19, 2023 Retrieval Text Retrieval
Code Code Available 1FlexiViT: One Model for All Patch Sizes Dec 15, 2022 All Image-text Retrieval
Code Code Available 1ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Aug 16, 2023 Action Classification Image-text Retrieval
Code Code Available 1