| Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Jul 26, 2024 | cross-modal alignmentimage-classification | —Unverified | 0 | 0 |
| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 | 0 |
| Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces | May 18, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 | 0 |
| VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix | Jun 17, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |
| VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Apr 11, 2025 | cross-modal alignmentInformation Retrieval | —Unverified | 0 | 0 |
| Wearable Accelerometer Foundation Models for Health via Knowledge Distillation | Dec 15, 2024 | Activity Recognitioncross-modal alignment | —Unverified | 0 | 0 |
| WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction | Jun 6, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 | 0 |