| AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment | May 8, 2023 | cross-modal alignmentRhythm | —Unverified | 0 |
| Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining | Apr 26, 2023 | cross-modal alignmentMedical Visual Question Answering | CodeCode Available | 1 |
| CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval | Apr 15, 2023 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger | Mar 30, 2023 | cross-modal alignmentzero-shot-classification | —Unverified | 0 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens | Mar 27, 2023 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Mar 10, 2023 | cross-modal alignmentSign Language Recognition | CodeCode Available | 1 |
| LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion | Mar 7, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 0 |
| HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention | Mar 6, 2023 | cross-modal alignment | CodeCode Available | 1 |