| Do Explanations make VQA Models more Predictable to a Human? | Oct 29, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Latent Variable Models for Visual Question Answering | Jan 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Generative Visual Question Answering | Jul 18, 2023 | Generative Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| American == White in Multimodal Language-and-Image AI | Jul 1, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Abduction of Domain Relationships from Data for VQA | Feb 13, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Compound Tokens: Channel Fusion for Vision-Language Representation Learning | Dec 2, 2022 | DecoderLanguage Modeling | —Unverified | 0 |
| MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs | Jun 24, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Generating Triples with Adversarial Networks for Scene Graph Construction | Feb 7, 2018 | Attributegraph construction | —Unverified | 0 |
| Compositional Memory for Visual Question Answering | Nov 18, 2015 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness | Jan 16, 2025 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 |
| Learning Answer Embeddings for Visual Question Answering | Jun 10, 2018 | Question AnsweringTransfer Learning | —Unverified | 0 |
| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 |
| Learning by Asking Questions | Dec 4, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment | Aug 30, 2024 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Learning Compositional Representation for Few-shot Visual Question Answering | Feb 21, 2021 | AttributeQuestion Answering | —Unverified | 0 |
| Generating Rationales in Visual Question Answering | Apr 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Generating Natural Questions from Images for Multimodal Assistants | Nov 17, 2020 | Common Sense ReasoningNatural Questions | —Unverified | 0 |
| DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback | Nov 29, 2023 | Image GenerationQuestion Answering | —Unverified | 0 |
| Attention Guided Semantic Relationship Parsing for Visual Question Answering | Oct 5, 2020 | ObjectQuestion Answering | —Unverified | 0 |
| Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention | Feb 15, 2019 | Explanation GenerationLanguage Modeling | —Unverified | 0 |
| Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models | Feb 13, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge | May 30, 2023 | Answer SelectionQuestion Answering | —Unverified | 0 |
| Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues | Mar 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network | Sep 23, 2019 | Question AnsweringTriplet | —Unverified | 0 |
| Compositional Attention Networks for Interpretability in Natural Language Question Answering | Oct 30, 2018 | Logical ReasoningQuestion Answering | —Unverified | 0 |