| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding | Feb 24, 2025 | cross-modal alignmentVisual Grounding | CodeCode Available | 1 |
| DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Feb 24, 2025 | cross-modal alignmentEarth Observation | —Unverified | 0 |
| MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model | Feb 23, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 0 |
| CrossOver: 3D Scene Cross-Modal Alignment | Feb 20, 2025 | cross-modal alignmentObject | CodeCode Available | 3 |
| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 |
| NOTA: Multimodal Music Notation Understanding for Visual Large Language Model | Feb 17, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| A Survey of Automatic Prompt Engineering: An Optimization Perspective | Feb 17, 2025 | cross-modal alignmentPrompt Engineering | —Unverified | 0 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Feb 12, 2025 | cross-modal alignmentmultimodal generation | CodeCode Available | 3 |