SOTAVerified

cross-modal alignment

Papers

Showing 311320 of 342 papers

TitleStatusHype
mSLAM: Massively multilingual joint pre-training for speech and text0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation0
Align and Prompt: Video-and-Language Pre-training with Entity PromptsCode1
Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment SupervisionCode1
Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge DistillationCode0
Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images0
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval0
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Show:102550
← PrevPage 32 of 35Next →

No leaderboard results yet.