| TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation | Oct 5, 2024 | cross-modal alignmentRetrieval | —Unverified | 0 |
| TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models | Jun 13, 2025 | cross-modal alignmentSegmentation | —Unverified | 0 |
| Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR | Sep 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge | Nov 21, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval | Sep 28, 2022 | cross-modal alignmentRetrieval | —Unverified | 0 |
| TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection | Feb 27, 2023 | cross-modal alignment | —Unverified | 0 |
| Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images | Aug 31, 2023 | 3D Shape GenerationContrastive Learning | —Unverified | 0 |
| Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques | Jun 5, 2025 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| Transformer-based Spatial Grounding: A Comprehensive Survey | Jul 17, 2025 | cross-modal alignmentSurvey | —Unverified | 0 |
| Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection | Sep 28, 2022 | 2D Object Detectioncross-modal alignment | —Unverified | 0 |
| TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation | Jun 26, 2025 | cross-modal alignmentInteractive Segmentation | —Unverified | 0 |
| TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models | Sep 23, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation | Jun 4, 2025 | cross-modal alignmentLipreading | —Unverified | 0 |
| Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Jul 26, 2024 | cross-modal alignmentimage-classification | —Unverified | 0 |
| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces | May 18, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix | Jun 17, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Apr 11, 2025 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| Wearable Accelerometer Foundation Models for Health via Knowledge Distillation | Dec 15, 2024 | Activity Recognitioncross-modal alignment | —Unverified | 0 |
| WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction | Jun 6, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Multi-level Cross-modal Alignment for Image Clustering | Jan 22, 2024 | Clusteringcross-modal alignment | —Unverified | 0 |