| Masked Vision and Language Modeling for Multi-modal Representation Learning | Aug 3, 2022 | cross-modal alignmentLanguage Modeling | —Unverified | 0 | 0 |
| MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval | Oct 30, 2023 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 | 0 |
| MCQA: Multimodal Co-attention Based Network for Question Answering | Apr 25, 2020 | cross-modal alignmentQuestion Answering | —Unverified | 0 | 0 |
| MDE: Modality Discrimination Enhancement for Multi-modal Recommendation | Feb 8, 2025 | cross-modal alignmentMulti-modal Recommendation | —Unverified | 0 | 0 |
| Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment | Feb 15, 2024 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 | 0 |
| Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity | Apr 5, 2024 | cross-modal alignmentFederated Learning | —Unverified | 0 | 0 |
| Mix and match networks: cross-modal alignment for zero-pair image-to-image translation | Mar 8, 2019 | cross-modal alignmentDecoder | —Unverified | 0 | 0 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 | 0 |
| MLLMs are Deeply Affected by Modality Bias | May 24, 2025 | cross-modal alignment | —Unverified | 0 | 0 |
| Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms | Oct 17, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 | 0 |