ImageBind: One Embedding Space To Bind Them All May 9, 2023 All Cross-Modal Retrieval
Code Code Available 5Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 4AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation Apr 4, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 3Merlin: A Vision Language Foundation Model for 3D Computed Tomography Jun 10, 2024 3D Semantic Segmentation Computed Tomography (CT)
Code Code Available 3Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 2Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation Mar 12, 2024 Cross-Modal Retrieval GPU
Code Code Available 2Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Sep 30, 2024 Cross-Modal Retrieval Dynamic Time Warping
Code Code Available 2Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Vision-Language Pre-Training with Triple Contrastive Learning Feb 21, 2022 Contrastive Learning cross-modal alignment
Code Code Available 2X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 2Semantic-Conditional Diffusion Networks for Image Captioning Dec 6, 2022 Cross-Modal Retrieval Decoder
Code Code Available 2Comprehending and Ordering Semantics for Image Captioning Jun 14, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology Mar 19, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 2Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 2RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing Dec 20, 2023 Attribute Cross-Modal Retrieval
Code Code Available 2VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Jun 7, 2023 Cross-Modal Retrieval Language Modelling
Code Code Available 2Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2MolFM: A Multimodal Molecular Foundation Model Jun 6, 2023 Cross-Modal Retrieval Knowledge Graphs
Code Code Available 2VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 2Large Language Models are In-Context Molecule Learners Mar 7, 2024 Cross-Modal Retrieval In-Context Learning
Code Code Available 2Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Jun 9, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Composed Multi-modal Retrieval: A Survey of Approaches and Applications Mar 3, 2025 Cross-Modal Retrieval Data Augmentation
Code Code Available 2LeanVec: Searching vectors faster by making them fit Dec 26, 2023 Cross-Modal Retrieval Dimensionality Reduction
Code Code Available 2Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval Apr 21, 2022 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Feb 11, 2021 Cross-Modal Retrieval Fine-Grained Image Classification
Code Code Available 2A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language Sep 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 1M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework Sep 9, 2024 Computational Efficiency Cross-Modal Retrieval
Code Code Available 1A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 1Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective Dec 8, 2023 Cross-Modal Retrieval Data Augmentation
Code Code Available 1FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Mar 4, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 1Adaptive label-aware graph convolutional networks for cross-modal retrieval Aug 6, 2021 Cross-Modal Retrieval Representation Learning
Code Code Available 1Emotion Embedding Spaces for Matching Music to Stories Nov 26, 2021 Cross-Modal Retrieval Metric Learning
Code Code Available 1Dynamic Modality Interaction Modeling for Image-Text Retrieval Jul 11, 2021 cross-modal alignment Cross-Modal Retrieval
Code Code Available 1BadCM: Invisible Backdoor Attack Against Cross-Modal Learning Oct 3, 2024 Backdoor Attack Cross-Modal Retrieval
Code Code Available 1Dual adversarial graph neural networks for multi-label cross-modal retrieval May 18, 2021 Cross-Modal Retrieval Retrieval
Code Code Available 1End-to-end Knowledge Retrieval with Multi-modal Queries Jun 1, 2023 Benchmarking Cross-Modal Retrieval
Code Code Available 1FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval May 20, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 1Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models May 31, 2023 Cross-Modal Retrieval Question Answering
Code Code Available 1BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping Oct 29, 2023 Contrastive Learning Cross-Modal Retrieval
Code Code Available 1CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval May 29, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Deep Evidential Learning with Noisy Correspondence for Cross-Modal Retrieval Oct 10, 2022 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios Apr 23, 2025 Cross-Modal Retrieval Recommendation Systems
Code Code Available 1Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment Jul 18, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1