Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Mar 1, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 15 DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases Sep 30, 2022 Entity Linking Question Answering
Code Code Available 15 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 15 Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 15 FILIP: Fine-grained Interactive Language-Image Pre-Training Nov 9, 2021 image-classification Image Classification
Code Code Available 15 Fine-Tuning LLaMA for Multi-Stage Text Retrieval Oct 12, 2023 Passage Retrieval Retrieval
Code Code Available 15 Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training Jun 1, 2022 Contrastive Learning Cross-Lingual Transfer
Code Code Available 15 Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Oct 11, 2019 Graph Matching Image-text Retrieval
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search Dec 30, 2024 RAG Retrieval
Code Code Available 15 Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control Feb 27, 2024 GPU Image Retrieval
Code Code Available 15 Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Cross-Modal Retrieval with Partially Mismatched Pairs Feb 22, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 15 DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15 Extending Multi-modal Contrastive Representations Oct 13, 2023 3D Object Classification Representation Learning
Code Code Available 15 Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits Feb 12, 2021 CPU Document Ranking
Code Code Available 15 GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition Jan 1, 2021 Image-text Retrieval Medical Image Analysis
Code Code Available 15 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 15 Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Mar 19, 2024 Diagnostic image-classification
Code Code Available 15 Multi-event Video-Text Retrieval Aug 22, 2023 Language Modelling Retrieval
Code Code Available 15 Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 15 Cross-modal Contrastive Learning for Speech Translation May 5, 2022 Contrastive Learning Retrieval
Code Code Available 15 ESA: External Space Attention Aggregation for Image-Text Retrieval Oct 10, 2023 Image-text Retrieval Retrieval
Code Code Available 15 Fast and Light-Weight Answer Text Retrieval in Dialogue Systems May 27, 2022 Re-Ranking Retrieval
Code Code Available 15 Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 15 Helping Hands: An Object-Aware Ego-Centric Video Recognition Model Aug 15, 2023 Decoder Object
Code Code Available 15 I0T: Embedding Standardization Method Towards Zero Modality Gap Dec 18, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 15 Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations Jun 14, 2023 image-classification Image Classification
Code Code Available 15 MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 15 Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss Sep 9, 2021 Mixture-of-Experts Retrieval
Code Code Available 15 Vision-Language Dataset Distillation Aug 15, 2023 Dataset Distillation image-classification
Code Code Available 15 Dynamic Modality Interaction Modeling for Image-Text Retrieval Jul 11, 2021 cross-modal alignment Cross-Modal Retrieval
Code Code Available 15 Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 15 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 15 A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval Oct 27, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 15 Condenser: a Pre-training Architecture for Dense Retrieval Apr 16, 2021 Language Modelling Retrieval
Code Code Available 15 CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 15 Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 15 mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 15 Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 15 Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 15 ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Aug 16, 2023 Action Classification Image-text Retrieval
Code Code Available 15 LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space May 28, 2024 Contrastive Learning Decoder
Code Code Available 15 Learnable Pillar-based Re-ranking for Image-Text Retrieval Apr 25, 2023 Image-text Retrieval Re-Ranking
Code Code Available 15 CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval Dec 17, 2024 Contrastive Learning Information Retrieval
Code Code Available 15 CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback Jun 19, 2021 Image Retrieval Image-text Retrieval
Code Code Available 15