| Curriculum Script Distillation for Multilingual Visual Question Answering | Jan 17, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian | Dec 7, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Curriculum Learning for Compositional Visual Reasoning | Mar 27, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Dec 14, 2024 | Audio-visual Question AnsweringObject Tracking | —Unverified | 0 | 0 |
| A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions | Mar 26, 2024 | Gaze Target EstimationQuestion Answering | —Unverified | 0 | 0 |
| Pathological Visual Question Answering | Oct 6, 2020 | AI AgentQuestion Answering | —Unverified | 0 | 0 |
| Curriculum Learning Effectively Improves Low Data VQA | Dec 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Mar 27, 2025 | Image GenerationObject | —Unverified | 0 | 0 |
| PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese | Jul 17, 2023 | Question AnsweringVietnamese Visual Question Answering | —Unverified | 0 | 0 |
| CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering | May 22, 2025 | Computed Tomography (CT)Question Answering | —Unverified | 0 | 0 |
| PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering | Apr 19, 2024 | ArticlesInformation Retrieval | —Unverified | 0 | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 | 0 |
| PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models | Mar 16, 2025 | Machine UnlearningPrivacy Preserving | —Unverified | 0 | 0 |
| CS-VQA: Visual Question Answering with Compressively Sensed Images | Jun 8, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization | Nov 1, 2021 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 | 0 |
| A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop | Nov 30, 2019 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| Performance Analysis of Traditional VQA Models Under Limited Computational Resources | Feb 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Parameter Efficient Reinforcement Learning from Human Feedback | Mar 15, 2024 | Question Answeringreinforcement-learning | —Unverified | 0 | 0 |
| Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models | Oct 16, 2024 | Visual Question Answering | —Unverified | 0 | 0 |
| PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly | Jun 10, 2025 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| Physically Grounded Vision-Language Models for Robotic Manipulation | Sep 5, 2023 | Image CaptioningLanguage Modelling | —Unverified | 0 | 0 |
| PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals | Nov 29, 2022 | Deep LearningQuestion Answering | —Unverified | 0 | 0 |
| Cross-Modal Retrieval Augmentation for Multi-Modal Classification | Apr 16, 2021 | ClassificationCross-Modal Retrieval | —Unverified | 0 | 0 |
| Why Does a Visual Question Have Different Answers? | Aug 12, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 | 0 |