| CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | May 2, 2025 | audio-visual learningcross-modal alignment | CodeCode Available | 1 | 5 |
| Mask Grounding for Referring Image Segmentation | Dec 19, 2023 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 | 5 |
| SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality | Nov 27, 2024 | cross-modal alignment | CodeCode Available | 1 | 5 |
| Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning | Jan 1, 2025 | cross-modal alignmentDenoising | CodeCode Available | 1 | 5 |
| The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation | Jan 16, 2024 | cross-modal alignmentfeature selection | CodeCode Available | 1 | 5 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 | 5 |
| Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models | May 8, 2025 | Active Learningcross-modal alignment | CodeCode Available | 0 | 5 |
| OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment | Jun 11, 2025 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems | Mar 6, 2025 | cross-modal alignment | CodeCode Available | 0 | 5 |
| MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model | Feb 23, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 0 | 5 |
| Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice Modeling | May 15, 2025 | cross-modal alignment | CodeCode Available | 0 | 5 |
| A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues | Jul 24, 2022 | cross-modal alignmentTrajectory Planning | CodeCode Available | 0 | 5 |
| Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation | Oct 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Anatomical Attention Alignment representation for Radiology Report Generation | May 12, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 | 5 |
| MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Apr 29, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 | 5 |
| Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport Framework | Oct 18, 2023 | cross-modal alignmentGraph Matching | CodeCode Available | 0 | 5 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 | 5 |
| CAST: Cross-modal Alignment Similarity Test for Vision Language Models | Sep 17, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base | Dec 16, 2023 | cross-modal alignmentKnowledge Graphs | CodeCode Available | 0 | 5 |
| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 | 5 |
| Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | Oct 27, 2020 | cross-modal alignmentRepresentation Learning | CodeCode Available | 0 | 5 |
| DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction | Jul 25, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 | 5 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 | 5 |
| A coupled autoencoder approach for multi-modal analysis of cell types | Nov 6, 2019 | Clusteringcross-modal alignment | CodeCode Available | 0 | 5 |
| KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph | Sep 17, 2024 | cross-modal alignmentImage Captioning | CodeCode Available | 0 | 5 |