| Visual grounding for desktop graphical user interfaces | May 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models | Dec 5, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Text Generation in the Wild | Jul 19, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation | Oct 11, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| VL-Mamba: Exploring State Space Models for Multimodal Learning | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VLMaterial: Procedural Material Generation with Large Vision-Language Models | Jan 27, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning | May 26, 2025 | Large Language ModelReinforcement Learning (RL) | —Unverified | 0 |
| VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Sep 30, 2024 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos | Nov 15, 2024 | Fake News DetectionLarge Language Model | —Unverified | 0 |
| Vocabulary Attack to Hijack Large Language Model Applications | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation | May 19, 2025 | DiagnosticLanguage Modeling | —Unverified | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 |
| VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | Jul 29, 2024 | Deep LearningDomain Generalization | —Unverified | 0 |
| VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model | Jan 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving | Jul 9, 2024 | Autonomous DrivingImage to 3D | —Unverified | 0 |
| VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models | Feb 16, 2024 | Adversarial RobustnessLanguage Modelling | —Unverified | 0 |
| VSLLaVA: a pipeline of large multimodal foundation model for industrial vibration signal analysis | Sep 3, 2024 | Fault DiagnosisLanguage Modeling | —Unverified | 0 |
| WAFFLE: Multimodal Floorplan Understanding in the Wild | Dec 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model | Aug 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | Apr 22, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| Source Attribution for Large Language Model-Generated Data | Oct 1, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning | Jun 1, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| WavLLM: Towards Robust and Adaptive Speech Large Language Model | Mar 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors | Mar 9, 2023 | Human-Object Interaction DetectionLanguage Modeling | —Unverified | 0 |