SOTAVerified

cross-modal alignment

Papers

Showing 101125 of 342 papers

TitleStatusHype
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationCode3
MDE: Modality Discrimination Enhancement for Multi-modal Recommendation0
Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion0
Ola: Pushing the Frontiers of Omni-Modal Language ModelCode3
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modallyCode1
Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition0
Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model0
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher LearningCode1
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection0
Free Lunch Enhancements for Multi-modal Crowd CountingCode1
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation0
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment0
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image CaptioningCode1
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model EnhancementCode1
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization0
ChartAdapter: Large Vision-Language Model for Chart Summarization0
Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment0
Enhancing Visual Representation for Text-based Person SearchingCode0
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data0
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and GroundingCode1
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation0
RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models0
Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction0
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning0
GEAL: Generalizable 3D Affordance Learning with Cross-Modal ConsistencyCode1
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.