| DanceIt: Music-inspired Dancing Video Synthesis | Sep 17, 2020 | cross-modal alignmentRhythm | CodeCode Available | 1 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 |
| Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment | Dec 25, 2023 | cross-modal alignmentDecoder | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Mar 24, 2025 | cross-modal alignment | CodeCode Available | 1 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation | Jun 21, 2021 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 1 |
| Free Lunch Enhancements for Multi-modal Crowd Counting | Jan 1, 2025 | cross-modal alignmentCrowd Counting | CodeCode Available | 1 |
| Low-resource Neural Machine Translation with Cross-modal Alignment | Oct 13, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action Recognition | Sep 18, 2023 | Action Recognitioncross-modal alignment | CodeCode Available | 1 |