ImageBind: One Embedding Space To Bind Them All May 9, 2023 All Cross-Modal Retrieval
Code Code Available 5Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 4AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4Merlin: A Vision Language Foundation Model for 3D Computed Tomography Jun 10, 2024 3D Semantic Segmentation Computed Tomography (CT)
Code Code Available 3AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation Apr 4, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 3Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology Mar 19, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2Composed Multi-modal Retrieval: A Survey of Approaches and Applications Mar 3, 2025 Cross-Modal Retrieval Data Augmentation
Code Code Available 2Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Sep 30, 2024 Cross-Modal Retrieval Dynamic Time Warping
Code Code Available 2EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Jun 9, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation Mar 12, 2024 Cross-Modal Retrieval GPU
Code Code Available 2Large Language Models are In-Context Molecule Learners Mar 7, 2024 Cross-Modal Retrieval In-Context Learning
Code Code Available 2Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 2LeanVec: Searching vectors faster by making them fit Dec 26, 2023 Cross-Modal Retrieval Dimensionality Reduction
Code Code Available 2SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing Dec 20, 2023 Attribute Cross-Modal Retrieval
Code Code Available 2RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 2Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Jun 7, 2023 Cross-Modal Retrieval Language Modelling
Code Code Available 2MolFM: A Multimodal Molecular Foundation Model Jun 6, 2023 Cross-Modal Retrieval Knowledge Graphs
Code Code Available 2VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 2Semantic-Conditional Diffusion Networks for Image Captioning Dec 6, 2022 Cross-Modal Retrieval Decoder
Code Code Available 2X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 2PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Comprehending and Ordering Semantics for Image Captioning Jun 14, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval Apr 21, 2022 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Vision-Language Pre-Training with Triple Contrastive Learning Feb 21, 2022 Contrastive Learning cross-modal alignment
Code Code Available 2Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Feb 11, 2021 Cross-Modal Retrieval Fine-Grained Image Classification
Code Code Available 2Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 2MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment May 14, 2025 Clinical Knowledge Contrastive Learning
Code Code Available 1Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios Apr 23, 2025 Cross-Modal Retrieval Recommendation Systems
Code Code Available 1LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text Mar 25, 2025 Cross-Modal Retrieval Hallucination
Code Code Available 1Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval Mar 3, 2025 Cross-Modal Retrieval Retrieval
Code Code Available 1ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning Feb 27, 2025 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 1Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels Jan 3, 2025 Computational Efficiency Cross-Modal Retrieval
Code Code Available 1Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval Jan 1, 2025 Cross-Modal Retrieval Retrieval
Code Code Available 1IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents Dec 10, 2024 Cross-Modal Retrieval Image Classification
Code Code Available 1TaxaBind: A Unified Embedding Space for Ecological Applications Nov 1, 2024 Audio Classification Cross-Modal Retrieval
Code Code Available 1Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1BadCM: Invisible Backdoor Attack Against Cross-Modal Learning Oct 3, 2024 Backdoor Attack Cross-Modal Retrieval
Code Code Available 1M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework Sep 9, 2024 Computational Efficiency Cross-Modal Retrieval
Code Code Available 1Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment Jul 18, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 1UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching Jul 11, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval May 29, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Knowledge-enhanced Visual-Language Pretraining for Computational Pathology Apr 15, 2024 Cross-Modal Retrieval Language Modeling
Code Code Available 1Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining Mar 25, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition Mar 21, 2024 Cross-modal place recognition Cross-Modal Retrieval
Code Code Available 1