SOTAVerified

cross-modal alignment

Papers

Showing 301342 of 342 papers

TitleStatusHype
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation LearningCode1
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix0
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Reinforced Cross-modal Alignment for Radiology Report GenerationCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D DetectorsCode1
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal GroundingCode1
Vision-Language Pre-Training with Triple Contrastive LearningCode2
mSLAM: Massively multilingual joint pre-training for speech and text0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation0
Align and Prompt: Video-and-Language Pre-training with Entity PromptsCode1
Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment SupervisionCode1
Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge DistillationCode0
Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images0
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval0
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic SegmentationCode1
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic InformationCode0
Continual learning in cross-modal retrieval0
Scene-Intuitive Agent for Remote Embodied Visual Grounding0
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human GazeCode0
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsCode0
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding0
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation0
DanceIt: Music-inspired Dancing Video SynthesisCode1
Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose EstimationCode0
Symbiotic Adversarial Learning for Attribute-based Person SearchCode1
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm0
Cross-Modal Cross-Domain Moment Alignment Network for Person Search0
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models0
Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space0
MCQA: Multimodal Co-attention Based Network for Question Answering0
Curriculum Audiovisual Learning0
A coupled autoencoder approach for multi-modal analysis of cell typesCode0
ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching0
Mix and match networks: cross-modal alignment for zero-pair image-to-image translation0
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.