Data-Efficient Multimodal Fusion on a Single GPU Dec 15, 2023 GPU Image Retrieval
Code Code Available 15 Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 15 Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Mar 1, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 15 Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 15 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 15 FILIP: Fine-grained Interactive Language-Image Pre-Training Nov 9, 2021 image-classification Image Classification
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 Fine-Tuning LLaMA for Multi-Stage Text Retrieval Oct 12, 2023 Passage Retrieval Retrieval
Code Code Available 15 Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 15 Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Oct 11, 2019 Graph Matching Image-text Retrieval
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark Jun 10, 2023 Image-text Retrieval Medical Report Generation
Code Code Available 15 Cross-Modal Retrieval with Partially Mismatched Pairs Feb 22, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 15 DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15 Extending Multi-modal Contrastive Representations Oct 13, 2023 3D Object Classification Representation Learning
Code Code Available 15 Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits Feb 12, 2021 CPU Document Ranking
Code Code Available 15 GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition Jan 1, 2021 Image-text Retrieval Medical Image Analysis
Code Code Available 15 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 15 Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 15 Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Mar 19, 2024 Diagnostic image-classification
Code Code Available 15 Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 15 Vision-Language Dataset Distillation Aug 15, 2023 Dataset Distillation image-classification
Code Code Available 15 Multimodal Federated Learning via Contrastive Representation Ensemble Feb 17, 2023 Federated Learning Image-text Retrieval
Code Code Available 15 Cross-modal Contrastive Learning for Speech Translation May 5, 2022 Contrastive Learning Retrieval
Code Code Available 15 Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 15 HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking May 21, 2022 Passage Ranking Passage Re-Ranking
Code Code Available 15 Image-text Retrieval via Preserving Main Semantics of Vision Apr 20, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 15 ESA: External Space Attention Aggregation for Image-Text Retrieval Oct 10, 2023 Image-text Retrieval Retrieval
Code Code Available 15 A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 15 Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss Sep 9, 2021 Mixture-of-Experts Retrieval
Code Code Available 15 Fast and Light-Weight Answer Text Retrieval in Dialogue Systems May 27, 2022 Re-Ranking Retrieval
Code Code Available 15 Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Jun 14, 2023 image-classification Image Classification
Code Code Available 15 Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 15 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 15 A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval Oct 27, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 15 Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mar 30, 2021 Image Retrieval Retrieval
Code Code Available 15 Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control Feb 27, 2024 GPU Image Retrieval
Code Code Available 15 Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 15 CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 15 AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 15 Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 15 ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Aug 16, 2023 Action Classification Image-text Retrieval
Code Code Available 15 Learnable Pillar-based Re-ranking for Image-Text Retrieval Apr 25, 2023 Image-text Retrieval Re-Ranking
Code Code Available 15 Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Apr 7, 2018 Retrieval Text Retrieval
Code Code Available 15 CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval Dec 17, 2024 Contrastive Learning Information Retrieval
Code Code Available 15 CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback Jun 19, 2021 Image Retrieval Image-text Retrieval
Code Code Available 15