| Audio-Visual Semantic Graph Network for Audio-Visual Event Localization | Jan 1, 2025 | audio-visual event localizationcross-modal alignment | —Unverified | 0 | 0 |
| AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction | Mar 26, 2025 | Computed Tomography (CT)cross-modal alignment | —Unverified | 0 | 0 |
| Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data | Dec 19, 2024 | AutoMLcross-modal alignment | —Unverified | 0 | 0 |
| Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models | May 15, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 | 0 |
| Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation | May 16, 2025 | cross-modal alignmentDataset Distillation | —Unverified | 0 | 0 |
| Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection | Jul 15, 2025 | Anomaly ClassificationAnomaly Detection | —Unverified | 0 | 0 |
| CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation | May 21, 2025 | cross-modal alignmentDeepFake Detection | —Unverified | 0 | 0 |
| CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 | 0 |
| CATVis: Context-Aware Thought Visualization | Jul 15, 2025 | cross-modal alignmentEEG | —Unverified | 0 | 0 |
| CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | Jan 8, 2025 | Computational Efficiencycross-modal alignment | —Unverified | 0 | 0 |
| ChartAdapter: Large Vision-Language Model for Chart Summarization | Dec 30, 2024 | Chart Understandingcross-modal alignment | —Unverified | 0 | 0 |
| Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment | Jan 1, 2025 | Attributecross-modal alignment | —Unverified | 0 | 0 |
| CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling | Apr 2, 2024 | cross-modal alignmentGraph Learning | —Unverified | 0 | 0 |
| Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation | May 23, 2025 | Autonomous Drivingcross-modal alignment | —Unverified | 0 | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |
| Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Aug 15, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 | 0 |
| Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection | May 25, 2025 | cross-modal alignmentScene Understanding | —Unverified | 0 | 0 |
| Context-Enhanced Video Moment Retrieval with Large Language Models | May 21, 2024 | cross-modal alignmentLanguage Modeling | —Unverified | 0 | 0 |
| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 | 0 |
| Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space | May 11, 2020 | cross-modal alignmentDecoder | —Unverified | 0 | 0 |
| CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval | Apr 15, 2023 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |
| Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation | Aug 14, 2024 | cross-modal alignmentImage Segmentation | —Unverified | 0 | 0 |
| Cross-Modal Alignment Learning of Vision-Language Conceptual Systems | Jul 31, 2022 | cross-modal alignmentRepresentation Learning | —Unverified | 0 | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 | 0 |
| Cross-modal Alignment with Optimal Transport for CTC-based ASR | Sep 24, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Jul 1, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 | 0 |
| Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition | Jan 25, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 | 0 |
| Cross-Modal Cross-Domain Moment Alignment Network for Person Search | Jun 1, 2020 | cross-modal alignmentPerson Search | —Unverified | 0 | 0 |
| Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval | Aug 15, 2024 | cross-modal alignmentDenoising | —Unverified | 0 | 0 |
| Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality | Jan 25, 2024 | cross-modal alignmentFederated Learning | —Unverified | 0 | 0 |
| Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval | Oct 17, 2022 | cross-modal alignmentObject | —Unverified | 0 | 0 |
| CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis | Nov 1, 2024 | cross-modal alignmentPhenotype classification | —Unverified | 0 | 0 |
| Curriculum Audiovisual Learning | Jan 26, 2020 | Clusteringcross-modal alignment | —Unverified | 0 | 0 |
| DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning | Jun 26, 2025 | cross-modal alignmentRepresentation Learning | —Unverified | 0 | 0 |
| DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation | Nov 29, 2023 | cross-modal alignmentNavigate | —Unverified | 0 | 0 |
| Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations | Dec 9, 2024 | cross-modal alignmentEEG | —Unverified | 0 | 0 |
| Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model | May 25, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 | 0 |
| Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing | May 14, 2025 | cross-modal alignmentDenoising | —Unverified | 0 | 0 |
| DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding | May 8, 2025 | 3D visual groundingcross-modal alignment | —Unverified | 0 | 0 |
| Detection-based Intermediate Supervision for Visual Question Answering | Dec 26, 2023 | cross-modal alignmentLogical Reasoning | —Unverified | 0 | 0 |
| DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow | Apr 2, 2025 | Autonomous DrivingCamera Calibration | —Unverified | 0 | 0 |
| DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment | Aug 22, 2023 | AttributeConstituency Parsing | —Unverified | 0 | 0 |
| DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models | May 26, 2025 | cross-modal alignmentDomain Generalization | —Unverified | 0 | 0 |
| Disentangled Noisy Correspondence Learning | Aug 10, 2024 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |
| Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? | Feb 1, 2023 | cross-modal alignmentLanguage Acquisition | —Unverified | 0 | 0 |
| Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Jun 5, 2025 | cross-modal alignmentDense Captioning | —Unverified | 0 | 0 |
| Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition | Mar 13, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |
| DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Feb 24, 2025 | cross-modal alignmentEarth Observation | —Unverified | 0 | 0 |
| Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction | Dec 13, 2024 | cross-modal alignmentPrediction | —Unverified | 0 | 0 |
| EA-VTR: Event-Aware Video-Text Retrieval | Jul 10, 2024 | Action RecognitionContrastive Learning | —Unverified | 0 | 0 |