SOTAVerified

cross-modal alignment

Papers

Showing 101125 of 342 papers

TitleStatusHype
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentCode1
Mask Grounding for Referring Image SegmentationCode1
SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging ModalityCode1
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image CaptioningCode1
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and AggregationCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable ModelsCode0
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive AlignmentCode0
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsCode0
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language ModelCode0
Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice ModelingCode0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationCode0
Anatomical Attention Alignment representation for Radiology Report GenerationCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport FrameworkCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsCode0
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological MeasurementCode0
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsCode0
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
A coupled autoencoder approach for multi-modal analysis of cell typesCode0
KALE: An Artwork Image Captioning System Augmented with Heterogeneous GraphCode0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.