SOTAVerified

cross-modal alignment

Papers

Showing 251275 of 342 papers

TitleStatusHype
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image GenerationCode0
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person RetrievalCode1
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation0
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models0
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation LearningCode1
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationCode1
Improving speech translation by fusing speech and text0
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment0
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training0
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment0
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal PretrainingCode1
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval0
Unraveling Instance Associations: A Closer Look for Audio-Visual SegmentationCode1
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger0
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsCode0
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete TokensCode1
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational AlignmentCode1
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware AttentionCode1
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection0
End-to-end Semantic Object Detection with Cross-Modal Alignment0
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?0
Improving Cross-modal Alignment for Text-Guided Image Inpainting0
Linguistic Query-Guided Mask Generation for Referring Image Segmentation0
Show:102550
← PrevPage 11 of 14Next →

No leaderboard results yet.