| Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques | Jun 5, 2025 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation | Jun 4, 2025 | cross-modal alignmentLipreading | —Unverified | 0 |
| EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast | May 29, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models | May 26, 2025 | cross-modal alignmentDomain Generalization | —Unverified | 0 |
| From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data | May 26, 2025 | cross-modal alignmentInstruction Following | —Unverified | 0 |
| ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs | May 26, 2025 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval | May 26, 2025 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model | May 25, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection | May 25, 2025 | cross-modal alignmentScene Understanding | —Unverified | 0 |