SOTAVerified

cross-modal alignment

Papers

Showing 126150 of 342 papers

TitleStatusHype
A coupled autoencoder approach for multi-modal analysis of cell typesCode0
It is Never Too Late to Mend: Separate Learning for Multimedia RecommendationCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human GazeCode0
Listen Then See: Video Alignment with Speaker AttentionCode0
Craft: Cross-modal Aligned Features Improve Robustness of Prompt TuningCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsCode0
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image GenerationCode0
KALE: An Artwork Image Captioning System Augmented with Heterogeneous GraphCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-IdentificationCode0
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge DistillationCode0
HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI AnalysisCode0
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic InformationCode0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
Language-Guided Diffusion Model for Visual GroundingCode0
Asymmetric Cross-Scale Alignment for Text-Based Person SearchCode0
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image GenerationCode0
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal PerspectiveCode0
Enhancing Visual Representation for Text-based Person SearchingCode0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.