MedCLIP: Contrastive Learning from Unpaired Medical Images and Text Oct 18, 2022 Contrastive Learning Image-text Retrieval
Code Code Available 2CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment Sep 14, 2022 Retrieval Text Retrieval
Code Code Available 2Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 2Egocentric Video-Language Pretraining Jun 3, 2022 Action Recognition Contrastive Learning
Code Code Available 2Cross-lingual and Multilingual CLIP Jun 1, 2022 Contrastive Learning Image-text Retrieval
Code Code Available 2Vision-Language Pre-Training with Triple Contrastive Learning Feb 21, 2022 Contrastive Learning cross-modal alignment
Code Code Available 2BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models Apr 17, 2021 Argument Retrieval Benchmarking
Code Code Available 2A Replication Study of Dense Passage Retriever Apr 12, 2021 Open-Domain Question Answering Question Answering
Code Code Available 2WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning Mar 2, 2021 BIG-bench Machine Learning Image Retrieval
Code Code Available 2Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Feb 11, 2021 Cross-Modal Retrieval Fine-Grained Image Classification
Code Code Available 2DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 1Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts May 20, 2025 Caption Generation Retrieval
Code Code Available 1mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 1Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models Mar 25, 2025 Benchmarking Image Captioning
Code Code Available 1GOAL: Global-local Object Alignment Learning Mar 22, 2025 Descriptive Object
Code Code Available 1PeerQA: A Scientific Question Answering Dataset from Peer Reviews Feb 19, 2025 answerability prediction Answer Generation
Code Code Available 1GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search Dec 30, 2024 RAG Retrieval
Code Code Available 1I0T: Embedding Standardization Method Towards Zero Modality Gap Dec 18, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 1CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval Dec 17, 2024 Contrastive Learning Information Retrieval
Code Code Available 1A Survey of Medical Vision-and-Language Applications and Their Techniques Nov 19, 2024 Decision Making Diagnostic
Code Code Available 1Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 1ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Sep 13, 2024 Audio Classification Descriptive
Code Code Available 1COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1Video-Language Alignment via Spatio-Temporal Graph Transformer Jul 16, 2024 Contrastive Learning Question Answering
Code Code Available 1CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 1SignCLIP: Connecting Text and Sign Language by Contrastive Learning Jul 1, 2024 Contrastive Learning Retrieval
Code Code Available 1Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 1Bridging Language Gaps in Audio-Text Retrieval Jun 11, 2024 AudioCaps Retrieval
Code Code Available 1Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval May 29, 2024 cross-modal alignment Image-text Retrieval
Code Code Available 1LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space May 28, 2024 Contrastive Learning Decoder
Code Code Available 1Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration May 26, 2024 Information Retrieval Retrieval
Code Code Available 1PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning May 16, 2024 Image-text Retrieval Representation Learning
Code Code Available 1Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation May 16, 2024 AudioCaps Event Detection
Code Code Available 1ArabicaQA: A Comprehensive Dataset for Arabic Question Answering Mar 26, 2024 Benchmarking Machine Reading Comprehension
Code Code Available 1Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning Mar 19, 2024 Diagnostic image-classification
Code Code Available 1Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control Feb 27, 2024 GPU Image Retrieval
Code Code Available 1LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration Feb 18, 2024 Multi-hop Question Answering Question Answering
Code Code Available 1Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization Dec 30, 2023 Answer Generation Contrastive Learning
Code Code Available 1InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval Dec 19, 2023 Few-Shot Learning Retrieval
Code Code Available 1Data-Efficient Multimodal Fusion on a Single GPU Dec 15, 2023 GPU Image Retrieval
Code Code Available 1RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos Dec 11, 2023 Natural Language Moment Retrieval Natural Language Queries
Code Code Available 1Predictive Chemistry Augmented with Text Retrieval Dec 8, 2023 molecular representation Retrieval
Code Code Available 1Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Nov 30, 2023 Attribute Compositional Zero-Shot Learning
Code Code Available 1MLLMs-Augmented Visual-Language Representation Learning Nov 30, 2023 Image-text Retrieval Representation Learning
Code Code Available 1