| Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Apr 23, 2024 | Image RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation | May 23, 2024 | Audio GenerationDenoising | —Unverified | 0 | 0 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| Visual grounding for desktop graphical user interfaces | May 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models | Dec 5, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Visual Text Generation in the Wild | Jul 19, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation | Oct 11, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 | 0 |
| VL-Mamba: Exploring State Space Models for Multimodal Learning | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| VLMaterial: Procedural Material Generation with Large Vision-Language Models | Jan 27, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |