| Visualizing Music Transformer | Oct 23, 2018 | ARCDescriptive | —Unverified | 0 | 0 |
| Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Nov 15, 2024 | DescriptiveObject | —Unverified | 0 | 0 |
| Visual Localization by Learning Objects-Of-Interest Dense Match Regression | Jun 1, 2019 | Descriptiveregression | —Unverified | 0 | 0 |
| Semantically-Prompted Language Models Improve Visual Descriptions | Jun 5, 2023 | ClassificationDescriptive | —Unverified | 0 | 0 |
| Visual Polarization Measurement Using Counterfactual Image Generation | Mar 13, 2025 | counterfactualDescriptive | —Unverified | 0 | 0 |
| Visual question answering: from early developments to recent advances -- a survey | Jan 7, 2025 | DescriptiveNatural Language Understanding | —Unverified | 0 | 0 |
| Visual Reasoning with Natural Language | Oct 2, 2017 | DescriptiveDiversity | —Unverified | 0 | 0 |
| VoxCommunis: A Corpus for Cross-linguistic Phonetic Analysis | Jun 1, 2022 | Descriptive | —Unverified | 0 | 0 |
| VRConvMF: Visual Recurrent Convolutional Matrix Factorization for Movie Recommendation | Feb 16, 2022 | DescriptiveMovie Recommendation | —Unverified | 0 | 0 |
| Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction | May 8, 2018 | DescriptiveMultiple Instance Learning | —Unverified | 0 | 0 |