SOTAVerified

cross-modal alignment

Papers

Showing 101150 of 342 papers

TitleStatusHype
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentCode1
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic SegmentationCode1
Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal AlignmentCode1
Towards Bridging the Cross-modal Semantic Gap for Multi-modal RecommendationCode1
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry OutcomesCode1
RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation ModelsCode1
Towards Cross-Modal Text-Molecule Retrieval with Better Modality AlignmentCode0
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action RecognitionCode0
SimVTP: Simple Video Text Pre-training with Masked AutoencodersCode0
Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice ModelingCode0
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal AlignmentCode0
Reinforced Cross-modal Alignment for Radiology Report GenerationCode0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsCode0
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationCode0
Anatomical Attention Alignment representation for Radiology Report GenerationCode0
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable ModelsCode0
Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport FrameworkCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive AlignmentCode0
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language ModelCode0
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsCode0
CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological MeasurementCode0
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph AttentionCode0
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsCode0
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionCode0
A coupled autoencoder approach for multi-modal analysis of cell typesCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
It is Never Too Late to Mend: Separate Learning for Multimedia RecommendationCode0
Listen Then See: Video Alignment with Speaker AttentionCode0
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
Cross-attention for State-based model RWKV-7Code0
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsCode0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
Craft: Cross-modal Aligned Features Improve Robustness of Prompt TuningCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
KALE: An Artwork Image Captioning System Augmented with Heterogeneous GraphCode0
COST: Contrastive One-Stage Transformer for Vision-Language Small Object TrackingCode0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge DistillationCode0
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image GenerationCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-IdentificationCode0
HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI AnalysisCode0
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic InformationCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
Asymmetric Cross-Scale Alignment for Text-Based Person SearchCode0
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human GazeCode0
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image GenerationCode0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.