| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 | 5 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion | Mar 7, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 0 | 5 |
| DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction | Jul 25, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 | 5 |
| Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | Oct 27, 2020 | cross-modal alignmentRepresentation Learning | CodeCode Available | 0 | 5 |
| A coupled autoencoder approach for multi-modal analysis of cell types | Nov 6, 2019 | Clusteringcross-modal alignment | CodeCode Available | 0 | 5 |
| Language-based Image Colorization: A Benchmark and Beyond | Mar 19, 2025 | BenchmarkingColorization | CodeCode Available | 0 | 5 |
| Language-Guided Diffusion Model for Visual Grounding | Aug 18, 2023 | cross-modal alignmentDenoising | CodeCode Available | 0 | 5 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 | 5 |
| KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph | Sep 17, 2024 | cross-modal alignmentImage Captioning | CodeCode Available | 0 | 5 |