| A survey on VQA_Datasets and Approaches | May 2, 2021 | Question AnsweringSurvey | —Unverified | 0 |
| Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads | Apr 30, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Document Collection Visual Question Answering | Apr 27, 2021 | document understandingQuestion Answering | —Unverified | 0 |
| InfographicVQA | Apr 26, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | Apr 26, 2021 | Generalized Referring Expression ComprehensionPhrase Grounding | CodeCode Available | 1 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering | Apr 20, 2021 | Graph Neural NetworkGraph Question Answering | CodeCode Available | 1 |
| Cross-Modal Retrieval Augmentation for Multi-Modal Classification | Apr 16, 2021 | ClassificationCross-Modal Retrieval | —Unverified | 0 |
| VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks | Apr 16, 2021 | Information RetrievalQuestion Answering | —Unverified | 0 |
| Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention | Apr 14, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images | Apr 13, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Neuro-Symbolic VQA: A review from the perspective of AGI desiderata | Apr 13, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| How Transferable are Reasoning Patterns in VQA? | Apr 8, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multimodal Continuous Visual Attention Mechanisms | Apr 7, 2021 | ClusteringQuestion Answering | —Unverified | 0 |
| Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering | Apr 7, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Compressing Visual-linguistic Model via Knowledge Distillation | Apr 5, 2021 | Image CaptioningKnowledge Distillation | —Unverified | 0 |
| MMBERT: Multimodal BERT Pretraining for Improved Medical VQA | Apr 3, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VisQA: X-raying Vision and Language Reasoning in Transformers | Apr 2, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards General Purpose Vision Systems | Apr 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training | Apr 1, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Are Bias Mitigation Techniques for Deep Learning Effective? | Apr 1, 2021 | Deep LearningQuestion Answering | CodeCode Available | 1 |
| Analysis on Image Set Visual Question Answering | Mar 31, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Domain-robust VQA with diverse datasets and methods but no target labels | Mar 29, 2021 | Domain AdaptationObject Recognition | —Unverified | 0 |
| Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers | Mar 29, 2021 | DecoderImage Segmentation | CodeCode Available | 1 |