| CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally | Feb 5, 2025 | Attributecross-modal alignment | CodeCode Available | 1 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning | Jan 1, 2025 | cross-modal alignmentDenoising | CodeCode Available | 1 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| Free Lunch Enhancements for Multi-modal Crowd Counting | Jan 1, 2025 | cross-modal alignmentCrowd Counting | CodeCode Available | 1 |
| ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding | Dec 17, 2024 | cross-modal alignment | CodeCode Available | 1 |
| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 |
| Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation | Dec 12, 2024 | cross-modal alignmentMultimodal Music Generation | CodeCode Available | 1 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model | Dec 2, 2024 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality | Nov 27, 2024 | cross-modal alignment | CodeCode Available | 1 |