| On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints | Sep 25, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |
| On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints | Sep 30, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |
| Visual Entailment: A Novel Task for Fine-Grained Image Understanding | Jan 20, 2019 | Natural Language InferenceQuestion Answering | —Unverified | 0 | 0 |
| On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study | Oct 4, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| On the Effects of Video Grounding on Language Models | Oct 1, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering | Jan 11, 2022 | POSQuestion Answering | —Unverified | 0 | 0 |
| On the Flip Side: Identifying Counterexamples in Visual Question Answering | Jun 3, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering | Feb 24, 2020 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering | Sep 4, 2019 | Image CaptioningObject | —Unverified | 0 | 0 |
| On the Limitations of Vision-Language Models in Understanding Image Transforms | Mar 12, 2025 | Question AnsweringVideo Generation | —Unverified | 0 | 0 |
| On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications | Dec 23, 2023 | geo-localizationimage-classification | —Unverified | 0 | 0 |
| On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering | Aug 28, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 | 0 |
| On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law | May 19, 2020 | Model SelectionQuestion Answering | —Unverified | 0 | 0 |
| Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering | Jan 1, 2023 | Continual LearningLanguage Modelling | —Unverified | 0 | 0 |
| Debating for Better Reasoning: An Unsupervised Multimodal Approach | May 20, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer | Mar 30, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Entailment Task for Visually-Grounded Language Learning | Nov 26, 2018 | Grounded language learningNatural Language Inference | —Unverified | 0 | 0 |
| Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation | Nov 11, 2019 | Domain AdaptationQuestion Answering | —Unverified | 0 | 0 |
| Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation | Oct 27, 2023 | Image GenerationQuestion Answering | —Unverified | 0 | 0 |
| Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond | Oct 23, 2023 | counterfactualMultiple-choice | —Unverified | 0 | 0 |
| Visual Explanations from Hadamard Product in Multimodal Deep Networks | Dec 18, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Aligned Dual Channel Graph Convolutional Network for Visual Question Answering | Jul 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Optimizing Explanations by Network Canonization and Hyperparameter Search | Nov 30, 2022 | Explainable Artificial Intelligence (XAI)image-classification | —Unverified | 0 | 0 |
| Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns | Jun 13, 2024 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Aug 7, 2024 | GPUQuestion Answering | —Unverified | 0 | 0 |
| Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models | Oct 22, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 | 0 |
| ORD: Object Relationship Discovery for Visual Dialogue Generation | Jun 15, 2020 | Dialogue GenerationGraph Attention | —Unverified | 0 | 0 |
| ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation | Mar 25, 2025 | Action GenerationAutonomous Driving | —Unverified | 0 | 0 |
| Visual Graph Question Answering with ASP and LLMs for Language Parsing | Feb 13, 2025 | Graph Question AnsweringOptical Character Recognition | —Unverified | 0 | 0 |
| Data Metabolism: An Efficient Data Design Schema For Vision Language Model | Apr 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction | Apr 24, 2025 | Conformal PredictionHallucination | —Unverified | 0 | 0 |
| Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering | Nov 1, 2018 | Factual Visual Question AnsweringGeneral Knowledge | —Unverified | 0 | 0 |
| Visual Grounding Strategies for Text-Only Natural Language Processing | Mar 25, 2021 | Image RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| Data Augmentation for Visual Question Answering | Sep 1, 2017 | Data AugmentationGeneral Classification | —Unverified | 0 | 0 |
| Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training | Jun 1, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation | Jan 10, 2025 | Knowledge DistillationQuestion Answering | —Unverified | 0 | 0 |
| Overcoming Language Priors in Visual Question Answering with Adversarial Regularization | Oct 8, 2018 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| Visual Hallucination: Definition, Quantification, and Prescriptive Remediations | Mar 26, 2024 | HallucinationImage Captioning | —Unverified | 0 | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 | 0 |
| OVQA: A Clinically Generated Visual Question Answering Dataset | Jul 7, 2022 | BenchmarkingMedical Visual Question Answering | —Unverified | 0 | 0 |
| OWLViz: An Open-World Benchmark for Visual Question Answering | Mar 4, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| PaLI: A Jointly-Scaled Multilingual Language-Image Model | Sep 14, 2022 | DecoderFew-Shot Image Classification | —Unverified | 0 | 0 |
| Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning | Dec 14, 2024 | Decision MakingQuestion Answering | —Unverified | 0 | 0 |
| PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Feb 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Cycle-Consistency for Robust Visual Question Answering | Feb 15, 2019 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| PAM: Understanding Product Images in Cross Product Category Attribute Extraction | Jun 8, 2021 | AttributeAttribute Extraction | —Unverified | 0 | 0 |
| CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | Jun 10, 2024 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset | Apr 26, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| AI2D-RST: A multimodal corpus of 1000 primary school science diagrams | Dec 9, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |