| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 | 5 |
| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 | 5 |
| DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors | Apr 6, 2022 | 3D geometry3D Object Detection | CodeCode Available | 1 | 5 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 | 5 |
| BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction | Dec 22, 2023 | cross-modal alignmentEEG | CodeCode Available | 1 | 5 |
| CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally | Feb 5, 2025 | Attributecross-modal alignment | CodeCode Available | 1 | 5 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement | Jan 1, 2025 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 | 5 |
| CLIP-Driven Fine-grained Text-Image Person Re-identification | Oct 19, 2022 | cross-modal alignmentPerson Re-Identification | CodeCode Available | 1 | 5 |
| Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners | Oct 3, 2024 | cross-modal alignment | CodeCode Available | 1 | 5 |
| BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 | 5 |