| Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations | Mar 24, 2025 | cross-modal alignmentImage Classification | CodeCode Available | 1 |
| CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Mar 24, 2025 | cross-modal alignment | CodeCode Available | 1 |
| Language-based Image Colorization: A Benchmark and Beyond | Mar 19, 2025 | BenchmarkingColorization | CodeCode Available | 0 |
| Shushing! Let's Imagine an Authentic Speech from the Silent Video | Mar 19, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation | Mar 14, 2025 | cross-modal alignmentNavigate | —Unverified | 0 |
| Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition | Mar 13, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| 4D-ACFNet: A 4D Attention Mechanism-Based Prognostic Framework for Colorectal Cancer Liver Metastasis Integrating Multimodal Spatiotemporal Features | Mar 12, 2025 | cross-modal alignmentDisentanglement | —Unverified | 0 |
| Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Mar 10, 2025 | 3D Object Detectioncross-modal alignment | —Unverified | 0 |
| LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Mar 10, 2025 | cross-modal alignment | —Unverified | 0 |
| OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection | Mar 9, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images | Mar 8, 2025 | cross-modal alignmentDiagnostic | CodeCode Available | 3 |
| RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems | Mar 6, 2025 | cross-modal alignment | CodeCode Available | 0 |
| Cross-modal Causal Relation Alignment for Video Question Grounding | Mar 5, 2025 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data | Mar 3, 2025 | cross-modal alignmentStyle Transfer | —Unverified | 0 |
| Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal | Mar 1, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding | Feb 24, 2025 | cross-modal alignmentVisual Grounding | CodeCode Available | 1 |
| DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Feb 24, 2025 | cross-modal alignmentEarth Observation | —Unverified | 0 |
| MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model | Feb 23, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 0 |
| CrossOver: 3D Scene Cross-Modal Alignment | Feb 20, 2025 | cross-modal alignmentObject | CodeCode Available | 3 |
| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 |
| NOTA: Multimodal Music Notation Understanding for Visual Large Language Model | Feb 17, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| A Survey of Automatic Prompt Engineering: An Optimization Perspective | Feb 17, 2025 | cross-modal alignmentPrompt Engineering | —Unverified | 0 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Feb 12, 2025 | cross-modal alignmentmultimodal generation | CodeCode Available | 3 |