| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 | 5 |
| LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation | Mar 25, 2025 | cross-modal alignmentOpen Vocabulary Semantic Segmentation | CodeCode Available | 1 | 5 |
| ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning | May 31, 2023 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 | 5 |
| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 | 5 |
| DanceIt: Music-inspired Dancing Video Synthesis | Sep 17, 2020 | cross-modal alignmentRhythm | CodeCode Available | 1 | 5 |
| Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment | Dec 25, 2023 | cross-modal alignmentDecoder | CodeCode Available | 1 | 5 |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 | 5 |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Mar 10, 2023 | cross-modal alignmentSign Language Recognition | CodeCode Available | 1 | 5 |
| CoMP: Continual Multimodal Pre-training for Vision Foundation Models | Mar 24, 2025 | cross-modal alignment | CodeCode Available | 1 | 5 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 | 5 |