| MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild | Apr 13, 2024 | cross-modal alignmentDynamic Facial Expression Recognition | CodeCode Available | 2 |
| Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity | Apr 5, 2024 | cross-modal alignmentFederated Learning | —Unverified | 0 |
| CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling | Apr 2, 2024 | cross-modal alignmentGraph Learning | —Unverified | 0 |
| SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention | Mar 13, 2024 | 3D visual groundingcross-modal alignment | CodeCode Available | 0 |
| Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision | Mar 6, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition | Mar 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Multi-modal Attribute Prompting for Vision-Language Models | Mar 1, 2024 | Attributecross-modal alignment | —Unverified | 0 |
| Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training | Mar 1, 2024 | cross-modal alignmentRepresentation Learning | —Unverified | 0 |
| MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation | Feb 29, 2024 | cross-modal alignmentMultimodal Recommendation | CodeCode Available | 1 |
| Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment | Feb 15, 2024 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |