SOTAVerified

cross-modal alignment

Papers

Showing 301325 of 342 papers

TitleStatusHype
Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding0
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationCode0
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval0
Video Referring Expression Comprehension via Transformer with Content-aware Query0
JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection0
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity0
Masked Vision and Language Modeling for Multi-modal Representation Learning0
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix0
Reinforced Cross-modal Alignment for Radiology Report GenerationCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
mSLAM: Massively multilingual joint pre-training for speech and text0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation0
Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge DistillationCode0
Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images0
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval0
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic InformationCode0
Continual learning in cross-modal retrieval0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.