SOTAVerified

cross-modal alignment

Papers

Showing 151175 of 342 papers

TitleStatusHype
HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI AnalysisCode0
ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-IdentificationCode0
See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity0
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection0
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training0
Semantic-Space-Intervened Diffusive Alignment for Visual Classification0
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation0
Shushing! Let's Imagine an Authentic Speech from the Silent Video0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger0
Sound Source Localization is All about Cross-Modal Alignment0
Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction0
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment0
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding0
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval0
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering0
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation0
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models0
Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR0
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge0
TMCIR: Token Merge Benefits Composed Image Retrieval0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection0
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.