| CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Mar 24, 2025 | cross-modal alignment | CodeCode Available | 1 |
| Global and Local Semantic Completion Learning for Vision-Language Pre-training | Jun 12, 2023 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Mask Grounding for Referring Image Segmentation | Dec 19, 2023 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models | Jul 8, 2025 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Dec 30, 2024 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment | May 19, 2025 | cross-modal alignmentTime Series | —Unverified | 0 |
| Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection | May 25, 2025 | cross-modal alignmentScene Understanding | —Unverified | 0 |
| Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework | Jul 12, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| EmotionRankCLAP: Bridging Natural Language Speaking Styles and Ordinal Speech Emotion via Rank-N-Contrast | May 29, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Aug 15, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 |
| A Survey of Automatic Prompt Engineering: An Optimization Perspective | Feb 17, 2025 | cross-modal alignmentPrompt Engineering | —Unverified | 0 |
| EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment | Oct 8, 2024 | cross-modal alignmentHallucination | —Unverified | 0 |
| EA-VTR: Event-Aware Video-Text Retrieval | Jul 10, 2024 | Action RecognitionContrastive Learning | —Unverified | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction | Dec 13, 2024 | cross-modal alignmentPrediction | —Unverified | 0 |
| DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Feb 24, 2025 | cross-modal alignmentEarth Observation | —Unverified | 0 |
| Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition | Mar 13, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| End-to-end Semantic Object Detection with Cross-Modal Alignment | Feb 10, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation | May 23, 2025 | Autonomous Drivingcross-modal alignment | —Unverified | 0 |
| ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs | May 26, 2025 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| 4D-ACFNet: A 4D Attention Mechanism-Based Prognostic Framework for Colorectal Cancer Liver Metastasis Integrating Multimodal Spatiotemporal Features | Mar 12, 2025 | cross-modal alignmentDisentanglement | —Unverified | 0 |
| Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning | Dec 12, 2024 | Active Learningcross-modal alignment | —Unverified | 0 |
| Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Jun 5, 2025 | cross-modal alignmentDense Captioning | —Unverified | 0 |
| Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data | Mar 3, 2025 | cross-modal alignmentStyle Transfer | —Unverified | 0 |
| Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? | Feb 1, 2023 | cross-modal alignmentLanguage Acquisition | —Unverified | 0 |
| CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling | Apr 2, 2024 | cross-modal alignmentGraph Learning | —Unverified | 0 |
| Disentangled Noisy Correspondence Learning | Aug 10, 2024 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment | Jan 1, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal | Mar 1, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| ChartAdapter: Large Vision-Language Model for Chart Summarization | Dec 30, 2024 | Chart Understandingcross-modal alignment | —Unverified | 0 |
| DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models | May 26, 2025 | cross-modal alignmentDomain Generalization | —Unverified | 0 |
| CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | Jan 8, 2025 | Computational Efficiencycross-modal alignment | —Unverified | 0 |
| DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment | Aug 22, 2023 | AttributeConstituency Parsing | —Unverified | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Jan 16, 2022 | cross-modal alignmentKnowledge Distillation | —Unverified | 0 |
| DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow | Apr 2, 2025 | Autonomous DrivingCamera Calibration | —Unverified | 0 |
| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| Detection-based Intermediate Supervision for Visual Question Answering | Dec 26, 2023 | cross-modal alignmentLogical Reasoning | —Unverified | 0 |
| CATVis: Context-Aware Thought Visualization | Jul 15, 2025 | cross-modal alignmentEEG | —Unverified | 0 |
| Intriguing Properties of Large Language and Vision Models | Oct 7, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding | May 8, 2025 | 3D visual groundingcross-modal alignment | —Unverified | 0 |
| Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing | May 14, 2025 | cross-modal alignmentDenoising | —Unverified | 0 |
| ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving | May 21, 2025 | Autonomous Drivingcross-modal alignment | —Unverified | 0 |
| Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model | May 25, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations | Dec 9, 2024 | cross-modal alignmentEEG | —Unverified | 0 |
| Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model | Jan 21, 2025 | cross-modal alignmentGraph Embedding | —Unverified | 0 |
| JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation | Oct 1, 2022 | cross-modal alignmentDisease Prediction | —Unverified | 0 |
| LangBridge: Interpreting Image as a Combination of Language Embeddings | Mar 25, 2025 | cross-modal alignment | —Unverified | 0 |
| DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation | Nov 29, 2023 | cross-modal alignmentNavigate | —Unverified | 0 |