SOTAVerified

cross-modal alignment

Papers

Showing 121130 of 342 papers

TitleStatusHype
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation0
EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs0
DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models0
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data0
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model0
Show:102550
← PrevPage 13 of 35Next →

No leaderboard results yet.