| UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion | Jan 24, 2024 | Conditional Image GenerationDenoising | —Unverified | 0 |
| Universal Item Tokenization for Transferable Generative Recommendation | Apr 6, 2025 | General KnowledgeLarge Language Model | —Unverified | 0 |
| UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning | May 20, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Aug 21, 2024 | Emotion RecognitionLanguage Modeling | —Unverified | 0 |
| Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition | May 7, 2024 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Feb 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Text Generation in the Wild | Jul 19, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation | Oct 11, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| VL-Mamba: Exploring State Space Models for Multimodal Learning | Mar 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Sep 30, 2024 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | Jul 29, 2024 | Deep LearningDomain Generalization | —Unverified | 0 |
| Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models | May 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research | Mar 16, 2025 | EEGLarge Language Model | —Unverified | 0 |
| WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | Dec 3, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery | Feb 26, 2024 | Continual LearningExemplar-Free | CodeCode Available | 0 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | May 26, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| Consistency-aware Fake Videos Detection on Short Video Platforms | Apr 30, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 0 |
| SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion | Oct 3, 2024 | Adversarial AttackDenoising | CodeCode Available | 0 |
| Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Oct 15, 2024 | HallucinationLarge Language Model | CodeCode Available | 0 |