SOTAVerified

cross-modal alignment

Papers

Showing 151200 of 342 papers

TitleStatusHype
Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space0
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval0
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation0
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation0
Cross-modal Alignment with Optimal Transport for CTC-based ASR0
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval0
Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition0
Cross-Modal Cross-Domain Moment Alignment Network for Person Search0
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval0
Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality0
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval0
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis0
Curriculum Audiovisual Learning0
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning0
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation0
Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations0
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model0
Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing0
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding0
Detection-based Intermediate Supervision for Visual Question Answering0
DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow0
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment0
DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models0
Disentangled Noisy Correspondence Learning0
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition0
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications0
Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction0
EA-VTR: Event-Aware Video-Text Retrieval0
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment0
EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast0
End-to-end Semantic Object Detection with Cross-Modal Alignment0
Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework0
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment0
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning0
Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment0
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training0
Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding0
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs0
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data0
Fully Aligned Network for Referring Image Segmentation0
Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach0
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding0
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations0
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation0
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.