MedCLIP: Contrastive Learning from Unpaired Medical Images and Text Oct 18, 2022 Contrastive Learning Image-text Retrieval
Code Code Available 2Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis Mar 25, 2025 Contrastive Learning Image-text Retrieval
Code Code Available 2Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing Dec 21, 2022 Contrastive Learning Drug Design
Code Code Available 2Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 2Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval Mar 8, 2024 Image-text Retrieval Retrieval
Code Code Available 2AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 2GLAP: General contrastive audio-text pretraining across domains and languages Jun 12, 2025 AudioCaps Keyword Spotting
Code Code Available 2M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval Jan 31, 2024 Retrieval Text Retrieval
Code Code Available 2TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 2FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Mar 22, 2024 Information Retrieval Retrieval
Code Code Available 2Audio Retrieval with WavText5K and CLAP Training Sep 28, 2022 AudioCaps Audio captioning
Code Code Available 1Audio Retrieval with Natural Language Queries: A Benchmark Study Dec 17, 2021 AudioCaps Audio captioning
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports Sep 3, 2020 Image-text Retrieval Medical Visual Question Answering
Code Code Available 1From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 1Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration May 26, 2024 Information Retrieval Retrieval
Code Code Available 1FlexiViT: One Model for All Patch Sizes Dec 15, 2022 All Image-text Retrieval
Code Code Available 1COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1A Survey of Medical Vision-and-Language Applications and Their Techniques Nov 19, 2024 Decision Making Diagnostic
Code Code Available 1COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning Oct 27, 2022 Language Modeling Language Modelling
Code Code Available 1Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 1Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 1Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Mar 1, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 1FILIP: Fine-grained Interactive Language-Image Pre-Training Nov 9, 2021 image-classification Image Classification
Code Code Available 1Fine-Tuning LLaMA for Multi-Stage Text Retrieval Oct 12, 2023 Passage Retrieval Retrieval
Code Code Available 1Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 1mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 1Fast and Light-Weight Answer Text Retrieval in Dialogue Systems May 27, 2022 Re-Ranking Retrieval
Code Code Available 1Extending Multi-modal Contrastive Representations Oct 13, 2023 3D Object Classification Representation Learning
Code Code Available 1AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning Aug 14, 2023 Contrastive Learning Generative Adversarial Network
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Mar 19, 2024 Diagnostic image-classification
Code Code Available 1FETA: Towards Specializing Foundation Models for Expert Task Applications Sep 8, 2022 Domain Generalization Few-Shot Learning
Code Code Available 1Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data Oct 8, 2023 Action Recognition Continual Learning
Code Code Available 1Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval Jul 1, 2020 Contrastive Learning Passage Retrieval
Code Code Available 1A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 1Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 1Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling Apr 14, 2021 GPU Re-Ranking
Code Code Available 1Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training Jun 15, 2023 Image-text Retrieval Representation Learning
Code Code Available 1A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval Oct 27, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 1A Dense Representation Framework for Lexical and Semantic Matching Jun 20, 2022 Retrieval Semantic Text Matching
Code Code Available 1Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 1Bridging Video-text Retrieval with Multiple Choice Questions Jan 13, 2022 Action Recognition Linear evaluation
Code Code Available 1ArabicaQA: A Comprehensive Dataset for Arabic Question Answering Mar 26, 2024 Benchmarking Machine Reading Comprehension
Code Code Available 1Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits Feb 12, 2021 CPU Document Ranking
Code Code Available 1CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval Dec 17, 2024 Contrastive Learning Information Retrieval
Code Code Available 1DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset Dec 8, 2022 Diversity Image Description
Code Code Available 1