| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis | Aug 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) | Aug 1, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Towards Visual Question Answering on Pathology Images | Aug 1, 2021 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering | Jul 24, 2021 | AttributeOut-of-Distribution Generalization | CodeCode Available | 0 |
| MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering | Jul 7, 2021 | Medical Visual Question AnsweringMissing Labels | —Unverified | 0 |
| Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory | Jul 4, 2021 | Question AnsweringScene Understanding | CodeCode Available | 0 |
| Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs | Jun 28, 2021 | Question AnsweringTask 2 | —Unverified | 0 |
| Multimodal Few-Shot Learning with Frozen Language Models | Jun 25, 2021 | Few-Shot LearningLanguage Modeling | —Unverified | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training | Jun 25, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| A Picture May Be Worth a Hundred Words for Visual Question Answering | Jun 25, 2021 | Data AugmentationDescriptive | —Unverified | 0 |
| VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis | Jun 19, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| How Modular Should Neural Module Networks Be for Systematic Generalization? | Jun 15, 2021 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Bayesian Attention Belief Networks | Jun 9, 2021 | DecoderMachine Translation | —Unverified | 0 |
| Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions | Jun 8, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| PAM: Understanding Product Images in Cross Product Category Attribute Extraction | Jun 8, 2021 | AttributeAttribute Extraction | —Unverified | 0 |
| Human-Adversarial Visual Question Answering | Jun 4, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounding Complex Navigational Instructions Using Scene Graphs | Jun 3, 2021 | Question Answeringreinforcement-learning | —Unverified | 0 |
| MIMOQA: Multimodal Input Multimodal Output Question Answering | Jun 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA | Jun 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models | Jun 1, 2021 | Data AugmentationQuestion Answering | —Unverified | 0 |
| CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Jun 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |