SOTAVerified

cross-modal alignment

Papers

Showing 101125 of 342 papers

TitleStatusHype
CoMP: Continual Multimodal Pre-training for Vision Foundation ModelsCode1
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation LearningCode1
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented BenchmarksCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
Symbiotic Adversarial Learning for Attribute-based Person SearchCode1
A Survey on Facial Expression Recognition of Static and Dynamic EmotionsCode1
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive AlignmentCode0
Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice ModelingCode0
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language ModelCode0
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable ModelsCode0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationCode0
Anatomical Attention Alignment representation for Radiology Report GenerationCode0
Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport FrameworkCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
CAST: Cross-modal Alignment Similarity Test for Vision Language ModelsCode0
CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological MeasurementCode0
Listen Then See: Video Alignment with Speaker AttentionCode0
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionCode0
Language-Guided Diffusion Model for Visual GroundingCode0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.