SOTAVerified

cross-modal alignment

Papers

Showing 241250 of 342 papers

TitleStatusHype
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation0
Shushing! Let's Imagine an Authentic Speech from the Silent Video0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger0
Sound Source Localization is All about Cross-Modal Alignment0
Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction0
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment0
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding0
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval0
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering0
Show:102550
← PrevPage 25 of 35Next →

No leaderboard results yet.