| Melody-Guided Music Generation | Sep 30, 2024 | cross-modal alignmentMusic Generation | CodeCode Available | 2 | 5 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 | 5 |
| HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding | Apr 20, 2024 | cross-modal alignmentVisual Grounding | CodeCode Available | 2 | 5 |
| A Survey on Facial Expression Recognition of Static and Dynamic Emotions | Aug 28, 2024 | cross-modal alignmentFacial Expression Recognition | CodeCode Available | 1 | 5 |
| Align and Prompt: Video-and-Language Pre-training with Entity Prompts | Dec 17, 2021 | cross-modal alignmentEntity Alignment | CodeCode Available | 1 | 5 |
| ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding | Dec 17, 2024 | cross-modal alignment | CodeCode Available | 1 | 5 |
| Global and Local Semantic Completion Learning for Vision-Language Pre-training | Jun 12, 2023 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 | 5 |
| Free Lunch Enhancements for Multi-modal Crowd Counting | Jan 1, 2025 | cross-modal alignmentCrowd Counting | CodeCode Available | 1 | 5 |
| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 | 5 |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Mar 10, 2023 | cross-modal alignmentSign Language Recognition | CodeCode Available | 1 | 5 |
| DanceIt: Music-inspired Dancing Video Synthesis | Sep 17, 2020 | cross-modal alignmentRhythm | CodeCode Available | 1 | 5 |
| Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training | Aug 15, 2024 | cross-modal alignment | CodeCode Available | 1 | 5 |
| Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation | Aug 24, 2023 | cross-modal alignmentDescriptive | CodeCode Available | 1 | 5 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 | 5 |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 | 5 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 | 5 |
| Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations | Mar 24, 2025 | cross-modal alignmentImage Classification | CodeCode Available | 1 | 5 |
| Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment | Dec 25, 2023 | cross-modal alignmentDecoder | CodeCode Available | 1 | 5 |
| A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition | Mar 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 | 5 |
| CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation | Nov 2, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 1 | 5 |
| BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction | Dec 22, 2023 | cross-modal alignmentEEG | CodeCode Available | 1 | 5 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 | 5 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 | 5 |
| BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 | 5 |