SOTAVerified

cross-modal alignment

Papers

Showing 2650 of 342 papers

TitleStatusHype
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information RetrievalCode1
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model0
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
MLLMs are Deeply Affected by Modality Bias0
ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-IdentificationCode0
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation0
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval0
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving0
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation0
U-SAM: An audio language Model for Unified Speech, Audio, and Music UnderstandingCode1
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment0
Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation0
FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining0
VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization0
Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice ModelingCode0
MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot LearningCode1
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing0
Anatomical Attention Alignment representation for Radiology Report GenerationCode0
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image GenerationCode0
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action RecognitionCode0
Semantic-Space-Intervened Diffusive Alignment for Visual Classification0
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding0
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.