SOTAVerified

cross-modal alignment

Papers

Showing 251300 of 342 papers

TitleStatusHype
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment0
EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast0
End-to-end Semantic Object Detection with Cross-Modal Alignment0
Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework0
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment0
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning0
Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment0
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training0
Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding0
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs0
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data0
Fully Aligned Network for Referring Image Segmentation0
Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach0
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding0
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations0
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation0
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations0
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning0
Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection0
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching0
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training0
How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?0
Improving Cross-modal Alignment for Text-Guided Image Inpainting0
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning0
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration0
Improving speech translation by fusing speech and text0
InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals0
Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model0
Intriguing Properties of Large Language and Vision Models0
JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation0
LangBridge: Interpreting Image as a Combination of Language Embeddings0
Linguistic Query-Guided Mask Generation for Referring Image Segmentation0
Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision0
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision0
Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images0
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm0
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment0
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion0
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?0
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization0
Masked Vision and Language Modeling for Multi-modal Representation Learning0
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval0
MCQA: Multimodal Co-attention Based Network for Question Answering0
MDE: Modality Discrimination Enhancement for Multi-modal Recommendation0
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.