| VIANA: Visual Interactive Annotation of Argumentation | Jul 29, 2019 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ViDAS: Vision-based Danger Assessment and Scoring | Oct 1, 2024 | Fixed Few Shot PromptingFixed Few Shot Prompting Danger Assessment | —Unverified | 0 |
| Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction | Jul 8, 2018 | DecoderLanguage Modeling | —Unverified | 0 |
| Video Description: A Survey of Methods, Datasets and Evaluation Metrics | Jun 1, 2018 | DiversityLanguage Modeling | —Unverified | 0 |
| Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Aug 21, 2024 | Emotion RecognitionLanguage Modeling | —Unverified | 0 |
| VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Nov 7, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| Video Imprint | Jun 7, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Video Language Model Pretraining with Spatio-temporal Masking | Jan 1, 2025 | DecoderLanguage Modeling | —Unverified | 0 |
| VideoLLM-online: Online Video Large Language Model for Streaming Video | Jun 17, 2024 | GPULanguage Modeling | —Unverified | 0 |
| VideoOrion: Tokenizing Object Dynamics in Videos | Nov 25, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VideoPoet: A Large Language Model for Zero-Shot Video Generation | Dec 21, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture | Mar 20, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VidLPRO: A Video-Language Pre-training Framework for Robotic and Laparoscopic Surgery | Sep 7, 2024 | Computational EfficiencyContrastive Learning | —Unverified | 0 |
| ViLAaD: Enhancing "Attracting and Dispersing'' Source-Free Domain Adaptation with Vision-and-Language Model | Mar 30, 2025 | Domain AdaptationLanguage Modeling | —Unverified | 0 |
| ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models | Apr 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models | Nov 13, 2023 | counterfactualLanguage Modeling | —Unverified | 0 |
| ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation | Aug 31, 2023 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| Vi-Mistral-X: Building a Vietnamese Language Model with Advanced Continual Pre-training | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VinaLLaMA: LLaMA-based Vietnamese Foundation Model | Dec 18, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder | Nov 15, 2023 | DecoderImage Captioning | —Unverified | 0 |
| ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Jul 24, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| Virtual Scientific Companion for Synchrotron Beamlines: A Prototype | Dec 28, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision and Intention Boost Large Language Model in Long-Term Action Anticipation | May 3, 2025 | Action AnticipationIn-Context Learning | —Unverified | 0 |
| Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning | Feb 19, 2025 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| Vision-centric Token Compression in Large Language Model | Feb 2, 2025 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework | Mar 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation | Feb 6, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 |
| Vision-Language Adaptive Mutual Decoder for OOV-STR | Sep 2, 2022 | DecoderLanguage Modeling | —Unverified | 0 |
| Vision-language Assisted Attribute Learning | Dec 12, 2023 | AttributeLanguage Modeling | —Unverified | 0 |
| Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction | Feb 28, 2024 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection | Apr 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces | Aug 13, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives | May 20, 2025 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment | Jun 14, 2024 | Image Quality AssessmentLanguage Modeling | —Unverified | 0 |
| Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft | May 9, 2024 | AllLanguage Modeling | —Unverified | 0 |
| Vision-Language Model IP Protection via Prompt-based Learning | Mar 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Vision Language Transformers: A Survey | Jul 6, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection | Dec 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| [Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI | Nov 5, 2024 | ChatbotLanguage Modeling | —Unverified | 0 |
| VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions | Jul 17, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction | Oct 31, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual attention models for scene text recognition | Jun 5, 2017 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning | Jun 3, 2022 | Image Paragraph CaptioningLanguage Modeling | —Unverified | 0 |
| Visual Comparison of Language Model Adaptation | Aug 17, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Conceptual Blending with Large-scale Language and Vision Models | Jun 27, 2021 | Image GenerationLanguage Modeling | —Unverified | 0 |
| Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Apr 23, 2024 | Image RetrievalLanguage Modeling | —Unverified | 0 |
| Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation | May 23, 2024 | Audio GenerationDenoising | —Unverified | 0 |
| Visual Features for Context-Aware Speech Recognition | Dec 1, 2017 | Language ModelingLanguage Modelling | —Unverified | 0 |