SOTAVerified

cross-modal alignment

Papers

Showing 201225 of 342 papers

TitleStatusHype
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wildCode2
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity0
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling0
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph AttentionCode0
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision0
A Cross-Modal Approach to Silent Speech with LLM-Enhanced RecognitionCode1
Multi-modal Attribute Prompting for Vision-Language Models0
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training0
MENTOR: Multi-level Self-supervised Learning for Multimodal RecommendationCode1
Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment0
Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality0
Multi-level Cross-modal Alignment for Image Clustering0
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and AggregationCode1
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection0
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio GenerationCode2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification0
Detection-based Intermediate Supervision for Visual Question Answering0
Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal AlignmentCode1
BrainVis: Exploring the Bridge between Brain and Visual Signals via Image ReconstructionCode1
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment RetrievalCode1
Mask Grounding for Referring Image SegmentationCode1
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning0
ViLA: Efficient Video-Language Alignment for Video Question AnsweringCode1
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.