| Melody-Guided Music Generation | Sep 30, 2024 | cross-modal alignmentMusic Generation | CodeCode Available | 2 | 5 |
| AerialVLN: Vision-and-Language Navigation for UAVs | Aug 13, 2023 | cross-modal alignmentNavigate | CodeCode Available | 2 | 5 |
| Law of Vision Representation in MLLMs | Aug 29, 2024 | cross-modal alignmentLanguage Modeling | CodeCode Available | 2 | 5 |
| Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment | Jan 1, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 2 | 5 |
| Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Jun 25, 2024 | cross-modal alignmentImage Classification | CodeCode Available | 2 | 5 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 | 5 |
| Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Oct 9, 2024 | cross-modal alignmentVisual Question Answering | CodeCode Available | 2 | 5 |
| Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation | Jan 2, 2024 | Audio Generationcross-modal alignment | CodeCode Available | 2 | 5 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 | 5 |
| CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Oct 4, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 2 | 5 |