| Learning Visual Knowledge Memory Networks for Visual Question Answering | Jun 13, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision | Apr 20, 2020 | counterfactualimage-classification | —Unverified | 0 |
| Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models | Nov 23, 2023 | Language ModellingLarge Language Model | —Unverified | 0 |
| LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | Mar 25, 2025 | Autonomous NavigationQuestion Answering | —Unverified | 0 |
| Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model | Jun 10, 2022 | Question AnsweringTask 2 | —Unverified | 0 |
| Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation | Jul 18, 2023 | Image GenerationQuestion Answering | —Unverified | 0 |
| Leveraging Medical Visual Question Answering with Supporting Facts | May 28, 2019 | DiversityMedical Visual Question Answering | —Unverified | 0 |
| Leveraging Visual Question Answering for Image-Caption Ranking | May 4, 2016 | Image RetrievalQuestion Answering | —Unverified | 0 |
| Leveraging Visual Question Answering to Improve Text-to-Image Synthesis | Oct 28, 2020 | Auxiliary LearningImage Generation | —Unverified | 0 |
| Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models | May 30, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Lightweight In-Context Tuning for Multimodal Unified Models | Oct 8, 2023 | Image CaptioningIn-Context Learning | —Unverified | 0 |
| Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning | Jun 8, 2025 | Medical Report GenerationQuestion Answering | —Unverified | 0 |
| LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation | Jul 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Linguistically Driven Graph Capsule Network for Visual Question Reasoning | Mar 23, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering | Jan 1, 2021 | Novel ConceptsQuestion Answering | —Unverified | 0 |
| LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing | Jun 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) | Aug 1, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Jun 17, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Oct 19, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 |
| LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs | Apr 29, 2025 | BenchmarkingFace Generation | —Unverified | 0 |
| Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling | Aug 20, 2021 | Data AblationOptical Character Recognition | —Unverified | 0 |
| Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA | Apr 4, 2023 | Answer GenerationLanguage Modelling | —Unverified | 0 |
| Logically Consistent Loss for Visual Question Answering | Nov 19, 2020 | Multi-Task LearningQuestion Answering | —Unverified | 0 |
| LOIS: Looking Out of Instance Semantics for Visual Question Answering | Jul 26, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |