| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention | Mar 13, 2024 | 3D visual groundingcross-modal alignment | CodeCode Available | 0 |
| Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | Oct 27, 2020 | cross-modal alignmentRepresentation Learning | CodeCode Available | 0 |
| Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective | Oct 14, 2024 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 |
| A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues | Jul 24, 2022 | cross-modal alignmentTrajectory Planning | CodeCode Available | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Apr 2, 2025 | cross-modal alignmentObject | CodeCode Available | 0 |
| MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model | Feb 23, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 0 |
| Language-Guided Diffusion Model for Visual Grounding | Aug 18, 2023 | cross-modal alignmentDenoising | CodeCode Available | 0 |
| Anatomical Attention Alignment representation for Radiology Report Generation | May 12, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 |
| A coupled autoencoder approach for multi-modal analysis of cell types | Nov 6, 2019 | Clusteringcross-modal alignment | CodeCode Available | 0 |
| Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice Modeling | May 15, 2025 | cross-modal alignment | CodeCode Available | 0 |
| Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search | Sep 28, 2023 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 |
| Language-based Image Colorization: A Benchmark and Beyond | Mar 19, 2025 | BenchmarkingColorization | CodeCode Available | 0 |
| Enhancing Visual Representation for Text-based Person Searching | Dec 30, 2024 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| SimVTP: Simple Video Text Pre-training with Masked Autoencoders | Dec 7, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment | Jun 11, 2025 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation | Oct 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation | Aug 2, 2023 | cross-modal alignmentDenoising | CodeCode Available | 0 |
| Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport Framework | Oct 18, 2023 | cross-modal alignmentGraph Matching | CodeCode Available | 0 |
| CAST: Cross-modal Alignment Similarity Test for Vision Language Models | Sep 17, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment | Oct 31, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Sep 22, 2021 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 0 |
| KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph | Sep 17, 2024 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Asymmetric Cross-Scale Alignment for Text-Based Person Search | Nov 26, 2022 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment | May 19, 2023 | cross-modal alignmentEmotion Recognition in Conversation | CodeCode Available | 0 |
| Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information | Apr 19, 2021 | cross-modal alignmentNavigate | CodeCode Available | 0 |
| ICPL-ReID: Identity-Conditional Prompt Learning for Multi-Spectral Object Re-Identification | May 23, 2025 | cross-modal alignmentPrompt Learning | CodeCode Available | 0 |
| HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis | Jun 19, 2025 | cross-modal alignmentMultiple Instance Learning | CodeCode Available | 0 |
| Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models | May 8, 2025 | Active Learningcross-modal alignment | CodeCode Available | 0 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction | Jul 25, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 |
| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 |
| RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems | Mar 6, 2025 | cross-modal alignment | CodeCode Available | 0 |
| HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation | May 10, 2025 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Reinforced Cross-modal Alignment for Radiology Report Generation | May 1, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 0 |
| It is Never Too Late to Mend: Separate Learning for Multimedia Recommendation | Jun 12, 2024 | cross-modal alignmentMultimedia recommendation | CodeCode Available | 0 |
| Cross-attention for State-based model RWKV-7 | Apr 19, 2025 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning | Jul 22, 2024 | cross-modal alignment | CodeCode Available | 0 |