SOTAVerified

cross-modal alignment

Papers

Showing 301325 of 342 papers

TitleStatusHype
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action RecognitionCode0
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph AttentionCode0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
Listen Then See: Video Alignment with Speaker AttentionCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language ModelCode0
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsCode0
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location CuesCode0
Craft: Cross-modal Aligned Features Improve Robustness of Prompt TuningCode0
Anatomical Attention Alignment representation for Radiology Report GenerationCode0
A coupled autoencoder approach for multi-modal analysis of cell typesCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice ModelingCode0
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal PerspectiveCode0
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive AlignmentCode0
SimVTP: Simple Video Text Pre-training with Masked AutoencodersCode0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Enhancing Visual Representation for Text-based Person SearchingCode0
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image GenerationCode0
Language-Guided Diffusion Model for Visual GroundingCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.