SOTAVerified

cross-modal alignment

Papers

Showing 101125 of 342 papers

TitleStatusHype
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal GroundingCode1
SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging ModalityCode1
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry OutcomesCode1
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image CaptioningCode1
Symbiotic Adversarial Learning for Attribute-based Person SearchCode1
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and AggregationCode1
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment0
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework0
EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
A Survey of Automatic Prompt Engineering: An Optimization Perspective0
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment0
EA-VTR: Event-Aware Video-Text Retrieval0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction0
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications0
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition0
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation0
End-to-end Semantic Object Detection with Cross-Modal Alignment0
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs0
4D-ACFNet: A 4D Attention Mechanism-Based Prognostic Framework for Colorectal Cancer Liver Metastasis Integrating Multimodal Spatiotemporal Features0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning0
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.