| Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Nov 26, 2024 | Decodermultimodal generation | —Unverified | 0 | 0 |
| VisLanding: Monocular 3D Perception for UAV Safe Landing via Depth-Normal Synergy | Jun 17, 2025 | Decision MakingSemantic Segmentation | —Unverified | 0 | 0 |
| Visual Image Reconstruction from Brain Activity via Latent Representation | May 13, 2025 | Early ClassificationImage Reconstruction | —Unverified | 0 | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 | 0 |
| VQ-AR: Vector Quantized Autoregressive Probabilistic Time Series Forecasting | May 31, 2022 | Decision MakingInductive Bias | —Unverified | 0 | 0 |
| WeLM: A Well-Read Pre-trained Language Model for Chinese | Sep 21, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| What Matters for Model Merging at Scale? | Oct 4, 2024 | modelTask Arithmetic | —Unverified | 0 | 0 |
| What Matters to You? Towards Visual Representation Alignment for Robot Learning | Oct 11, 2023 | Zero-shot Generalization | —Unverified | 0 | 0 |
| WHISTRESS: Enriching Transcriptions with Sentence Stress Detection | May 25, 2025 | SentenceZero-shot Generalization | —Unverified | 0 | 0 |
| WiFo: Wireless Foundation Model for Channel Prediction | Dec 12, 2024 | modelMulti-Task Learning | —Unverified | 0 | 0 |