| Multi-modal Attribute Prompting for Vision-Language Models | Mar 1, 2024 | Attributecross-modal alignment | —Unverified | 0 |
| Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval | Sep 23, 2022 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges | Jul 23, 2024 | cross-modal alignmentFairness | —Unverified | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 |
| Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval | Oct 26, 2024 | cross-modal alignmentPerson Retrieval | —Unverified | 0 |
| Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification | Dec 28, 2023 | Attributecross-modal alignment | —Unverified | 0 |
| Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training | May 13, 2023 | cross-modal alignment | —Unverified | 0 |
| NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text | Mar 27, 2025 | AttributeContrastive Learning | —Unverified | 0 |
| NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training | Sep 15, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| NOTA: Multimodal Music Notation Understanding for Visual Large Language Model | Feb 17, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation | Mar 14, 2025 | cross-modal alignmentNavigate | —Unverified | 0 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All | May 25, 2024 | Allcross-modal alignment | —Unverified | 0 |
| OmniVL:One Foundation Model for Image-Language and Video-Language Tasks | Sep 15, 2022 | Action ClassificationAction Recognition | —Unverified | 0 |
| OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities | Sep 17, 2024 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| On the Language Encoder of Contrastive Cross-modal Models | Oct 20, 2023 | cross-modal alignmentSentence | —Unverified | 0 |
| OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection | Dec 12, 2023 | cross-modal alignmentobject-detection | —Unverified | 0 |
| OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection | Mar 9, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | May 6, 2025 | cross-modal alignment | —Unverified | 0 |
| PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features | Dec 5, 2023 | cross-modal alignmentDecoder | —Unverified | 0 |
| Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation | Sep 7, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Apr 29, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 |
| HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation | May 10, 2025 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation | Aug 4, 2020 | 2D Pose Estimation3D Human Pose Estimation | CodeCode Available | 0 |