SOTAVerified

cross-modal alignment

Papers

Showing 226250 of 342 papers

TitleStatusHype
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization0
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding0
Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR0
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation0
Disentangled Noisy Correspondence Learning0
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment0
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionCode0
Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges0
Craft: Cross-modal Aligned Features Improve Robustness of Prompt TuningCode0
Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework0
EA-VTR: Event-Aware Video-Text Retrieval0
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval0
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval0
It is Never Too Late to Mend: Separate Learning for Multimedia RecommendationCode0
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching0
Multimodal Reasoning with Multimodal Knowledge Graph0
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
Context-Enhanced Video Moment Retrieval with Large Language Models0
Listen Then See: Video Alignment with Speaker AttentionCode0
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity0
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling0
Show:102550
← PrevPage 10 of 14Next →

No leaderboard results yet.