SOTAVerified

cross-modal alignment

Papers

Showing 151175 of 342 papers

TitleStatusHype
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training0
TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models0
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment0
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities0
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsCode0
KALE: An Artwork Image Captioning System Augmented with Heterogeneous GraphCode0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training0
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization0
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding0
Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR0
Law of Vision Representation in MLLMsCode2
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal GroundingCode1
A Survey on Facial Expression Recognition of Static and Dynamic EmotionsCode1
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval0
Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-trainingCode1
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation0
Disentangled Noisy Correspondence Learning0
Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion ApproachCode2
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment0
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionCode0
Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges0
Craft: Cross-modal Aligned Features Improve Robustness of Prompt TuningCode0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.