ImageBind: One Embedding Space To Bind Them All May 9, 2023 All Cross-Modal Retrieval
Code Code Available 55 Multimodal Whole Slide Foundation Model for Pathology Nov 29, 2024 Cross-Modal Retrieval model
Code Code Available 45 AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 45 AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation Apr 4, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 35 Merlin: A Vision Language Foundation Model for 3D Computed Tomography Jun 10, 2024 3D Semantic Segmentation Computed Tomography (CT)
Code Code Available 35 Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 25 VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 25 SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing Dec 20, 2023 Attribute Cross-Modal Retrieval
Code Code Available 25 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 25 X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 25 Composed Multi-modal Retrieval: A Survey of Approaches and Applications Mar 3, 2025 Cross-Modal Retrieval Data Augmentation
Code Code Available 25 Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology Mar 19, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 25 PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 25 EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 25 Comprehending and Ordering Semantics for Image Captioning Jun 14, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 25 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Feb 11, 2021 Cross-Modal Retrieval Fine-Grained Image Classification
Code Code Available 25 Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation Sep 30, 2024 Cross-Modal Retrieval Dynamic Time Warping
Code Code Available 25 Semantic-Conditional Diffusion Networks for Image Captioning Dec 6, 2022 Cross-Modal Retrieval Decoder
Code Code Available 25 Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation Mar 12, 2024 Cross-Modal Retrieval GPU
Code Code Available 25 Vision-Language Pre-Training with Triple Contrastive Learning Feb 21, 2022 Contrastive Learning cross-modal alignment
Code Code Available 25 Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 25 Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval Apr 21, 2022 Cross-Modal Retrieval Image Retrieval
Code Code Available 25 MolFM: A Multimodal Molecular Foundation Model Jun 6, 2023 Cross-Modal Retrieval Knowledge Graphs
Code Code Available 25 Large Language Models are In-Context Molecule Learners Mar 7, 2024 Cross-Modal Retrieval In-Context Learning
Code Code Available 25 LeanVec: Searching vectors faster by making them fit Dec 26, 2023 Cross-Modal Retrieval Dimensionality Reduction
Code Code Available 25 VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Apr 17, 2023 Audio captioning Audio-Video Question Answering (AVQA)
Code Code Available 25 RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 25 RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 25 Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Jun 9, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 25 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 25 Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Jun 7, 2023 Cross-Modal Retrieval Language Modelling
Code Code Available 25 Cross-modal Retrieval for Knowledge-based Visual Question Answering Jan 11, 2024 Cross-Modal Retrieval Question Answering
Code Code Available 15 A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language Sep 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 15 M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework Sep 9, 2024 Computational Efficiency Cross-Modal Retrieval
Code Code Available 15 A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 15 Cross-Modal Retrieval for Motion and Text via DopTriple Loss May 7, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 15 FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Mar 4, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Adaptive label-aware graph convolutional networks for cross-modal retrieval Aug 6, 2021 Cross-Modal Retrieval Representation Learning
Code Code Available 15 Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval Oct 19, 2022 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 15 BadCM: Invisible Backdoor Attack Against Cross-Modal Learning Oct 3, 2024 Backdoor Attack Cross-Modal Retrieval
Code Code Available 15 Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning Aug 26, 2022 Cross-Modal Retrieval Machine Translation
Code Code Available 15 Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions Aug 28, 2023 Cross-Modal Retrieval Retrieval
Code Code Available 15 FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval May 20, 2020 Cross-Modal Retrieval Retrieval
Code Code Available 15 Dual adversarial graph neural networks for multi-label cross-modal retrieval May 18, 2021 Cross-Modal Retrieval Retrieval
Code Code Available 15 Dynamic Modality Interaction Modeling for Image-Text Retrieval Jul 11, 2021 cross-modal alignment Cross-Modal Retrieval
Code Code Available 15 Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval Jun 22, 2021 Cross-Modal Retrieval Diversity
Code Code Available 15 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 15 Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment Jul 18, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 15 A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 15