| MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants | Oct 13, 2021 | intent-classificationIntent Classification | —Unverified | 0 |
| OVQA: A Clinically Generated Visual Question Answering Dataset | Jul 7, 2022 | BenchmarkingMedical Visual Question Answering | —Unverified | 0 |
| OWLViz: An Open-World Benchmark for Visual Question Answering | Mar 4, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| PaLI: A Jointly-Scaled Multilingual Language-Image Model | Sep 14, 2022 | DecoderFew-Shot Image Classification | —Unverified | 0 |
| PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Feb 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PAM: Understanding Product Images in Cross Product Category Attribute Extraction | Jun 8, 2021 | AttributeAttribute Extraction | —Unverified | 0 |
| ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian | Dec 7, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Dec 14, 2024 | Audio-visual Question AnsweringObject Tracking | —Unverified | 0 |
| Pathological Visual Question Answering | Oct 6, 2020 | AI AgentQuestion Answering | —Unverified | 0 |
| PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese | Jul 17, 2023 | Question AnsweringVietnamese Visual Question Answering | —Unverified | 0 |
| PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering | Apr 19, 2024 | ArticlesInformation Retrieval | —Unverified | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models | Mar 16, 2025 | Machine UnlearningPrivacy Preserving | —Unverified | 0 |
| Performance Analysis of Traditional VQA Models Under Limited Computational Resources | Feb 9, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Parameter Efficient Reinforcement Learning from Human Feedback | Mar 15, 2024 | Question Answeringreinforcement-learning | —Unverified | 0 |
| PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly | Jun 10, 2025 | Question AnsweringScene Understanding | —Unverified | 0 |
| Physically Grounded Vision-Language Models for Robotic Manipulation | Sep 5, 2023 | Image CaptioningLanguage Modelling | —Unverified | 0 |
| PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals | Nov 29, 2022 | Deep LearningQuestion Answering | —Unverified | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region | Mar 13, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models | Jun 14, 2024 | DecoderKnowledge Graphs | —Unverified | 0 |
| Predicting Relative Depth between Objects from Semantic Features | Jan 12, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| PreSTU: Pre-Training for Scene-Text Understanding | Sep 12, 2022 | DecoderImage Captioning | —Unverified | 0 |
| Pre-training image-language transformers for open-vocabulary tasks | Sep 9, 2022 | Question AnsweringVisual Entailment | —Unverified | 0 |
| Privacy Preserving Visual Question Answering | Feb 15, 2022 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering | Feb 21, 2019 | counterfactualQuestion Answering | —Unverified | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training | Jun 25, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training | May 21, 2021 | Question AnsweringRelation | —Unverified | 0 |
| Probing the Role of Positional Information in Vision-Language Models | Jan 16, 2022 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Probing the Role of Positional Information in Vision-Language Models | May 17, 2023 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Probing Visual Language Priors in VLMs | Dec 31, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data | Jul 17, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Jun 17, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | May 24, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Prompt-based Personalized Federated Learning for Medical Visual Question Answering | Feb 15, 2024 | Federated LearningMedical Visual Question Answering | —Unverified | 0 |
| PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 | Jan 1, 2023 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering | Dec 22, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention | May 5, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Proposing Plausible Answers for Open-ended Visual Question Answering | Oct 20, 2016 | Graph MatchingOpen-Ended Question Answering | —Unverified | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 |
| Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning | Jun 11, 2025 | In-Context LearningQuestion Answering | —Unverified | 0 |
| Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering | Jun 10, 2019 | Continual LearningQuestion Answering | —Unverified | 0 |
| Pushing the Limits of Radiology with Joint Modeling of Visual and Textual Information | Jul 1, 2018 | Image ClassificationMachine Translation | —Unverified | 0 |
| Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering | Jul 30, 2024 | Code GenerationQuestion Answering | —Unverified | 0 |
| Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder | Apr 4, 2023 | ClassificationDecoder | —Unverified | 0 |
| QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning | Apr 4, 2025 | Data AugmentationImage Generation | —Unverified | 0 |
| Question-Agnostic Attention for Visual Question Answering | Aug 9, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question-Conditioned Counterfactual Image Generation for VQA | Nov 14, 2019 | counterfactualImage Generation | —Unverified | 0 |