| Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Feb 12, 2025 | cross-modal alignmentmultimodal generation | CodeCode Available | 3 |
| MDE: Modality Discrimination Enhancement for Multi-modal Recommendation | Feb 8, 2025 | cross-modal alignmentMulti-modal Recommendation | —Unverified | 0 |
| Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion | Feb 7, 2025 | class-incremental learningClass Incremental Learning | —Unverified | 0 |
| Ola: Pushing the Frontiers of Omni-Modal Language Model | Feb 6, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 3 |
| CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally | Feb 5, 2025 | Attributecross-modal alignment | CodeCode Available | 1 |
| Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition | Jan 25, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 |
| Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model | Jan 21, 2025 | cross-modal alignmentGraph Embedding | —Unverified | 0 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | Jan 8, 2025 | Computational Efficiencycross-modal alignment | —Unverified | 0 |
| Free Lunch Enhancements for Multi-modal Crowd Counting | Jan 1, 2025 | cross-modal alignmentCrowd Counting | CodeCode Available | 1 |
| Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation | Jan 1, 2025 | Classificationcross-modal alignment | —Unverified | 0 |
| Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment | Jan 1, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning | Jan 1, 2025 | cross-modal alignmentDenoising | CodeCode Available | 1 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| Audio-Visual Semantic Graph Network for Audio-Visual Event Localization | Jan 1, 2025 | audio-visual event localizationcross-modal alignment | —Unverified | 0 |
| ChartAdapter: Large Vision-Language Model for Chart Summarization | Dec 30, 2024 | Chart Understandingcross-modal alignment | —Unverified | 0 |
| Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Dec 30, 2024 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| Enhancing Visual Representation for Text-based Person Searching | Dec 30, 2024 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data | Dec 19, 2024 | AutoMLcross-modal alignment | —Unverified | 0 |
| ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding | Dec 17, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Wearable Accelerometer Foundation Models for Health via Knowledge Distillation | Dec 15, 2024 | Activity Recognitioncross-modal alignment | —Unverified | 0 |
| RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models | Dec 15, 2024 | Autonomous DrivingContrastive Learning | —Unverified | 0 |
| Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction | Dec 13, 2024 | cross-modal alignmentPrediction | —Unverified | 0 |
| Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning | Dec 12, 2024 | Active Learningcross-modal alignment | —Unverified | 0 |
| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 |