| Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering | Dec 14, 2021 | Graph MatchingQuestion Answering | CodeCode Available | 1 |
| Change Detection Meets Visual Question Answering | Dec 12, 2021 | Answer GenerationChange Detection | CodeCode Available | 1 |
| Debiased Visual Question Answering from Feature and Sample Perspectives | Dec 1, 2021 | Bias DetectionQuestion Answering | CodeCode Available | 1 |
| Searching the Search Space of Vision Transformer | Nov 29, 2021 | Neural Architecture Searchobject-detection | CodeCode Available | 1 |
| UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling | Nov 23, 2021 | Image CaptioningImage Description | CodeCode Available | 1 |
| Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture | Nov 22, 2021 | Handwritten Text Recognitionobject-detection | CodeCode Available | 1 |
| Florence: A New Foundation Model for Computer Vision | Nov 22, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| ViVQA: Vietnamese Visual Question Answering | Nov 1, 2021 | Question AnsweringVietnamese Visual Question Answering | CodeCode Available | 1 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 |
| Label-Descriptive Patterns and Their Application to Characterizing Classification Errors | Oct 18, 2021 | Descriptivenamed-entity-recognition | CodeCode Available | 1 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360^ Videos | Oct 11, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Coarse-to-Fine Reasoning for Visual Question Answering | Oct 6, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering | Oct 3, 2021 | counterfactualDiagnostic | CodeCode Available | 1 |
| The Spoon Is in the Sink: Assisting Visually Impaired People in the Kitchen | Oct 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images | Oct 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Does Vision-and-Language Pretraining Improve Lexical Grounding? | Sep 21, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| xGQA: Cross-Lingual Visual Question Answering | Sep 13, 2021 | Cross-Lingual TransferLanguage Modeling | CodeCode Available | 1 |
| An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA | Sep 10, 2021 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering | Sep 9, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| WebQA: Multihop and Multimodal QA | Sep 1, 2021 | Image RetrievalMultimodal Reasoning | CodeCode Available | 1 |
| SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | Aug 24, 2021 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics | Aug 18, 2021 | Cross-Modal RetrievalDecoder | CodeCode Available | 1 |
| Task-Oriented Multi-User Semantic Communications for VQA Task | Aug 16, 2021 | Question AnsweringSemantic Communication | CodeCode Available | 1 |
| Sparse Continuous Distributions and Fenchel-Young Losses | Aug 4, 2021 | Audio ClassificationQuestion Answering | CodeCode Available | 1 |
| Check It Again:Progressive Visual Question Answering via Visual Entailment | Aug 1, 2021 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| Greedy Gradient Ensemble for Robust Visual Question Answering | Jul 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Separating Skills and Concepts for Novel Visual Question Answering | Jul 19, 2021 | AttributeContrastive Learning | CodeCode Available | 1 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Jul 13, 2021 | Question AnsweringVision and Language Navigation | CodeCode Available | 1 |
| Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering | Jul 13, 2021 | NavigateQuestion Answering | CodeCode Available | 1 |
| Zero-shot Visual Question Answering using Knowledge Graph | Jul 12, 2021 | Knowledge GraphsQuestion Answering | CodeCode Available | 1 |
| Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering | Jul 6, 2021 | Active LearningObject Recognition | CodeCode Available | 1 |
| RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words | Jun 19, 2021 | DecoderImage Captioning | CodeCode Available | 1 |
| Predicting Human Scanpaths in Visual Question Answering | Jun 19, 2021 | Deep Reinforcement LearningQuestion Answering | CodeCode Available | 1 |
| Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing | Jun 19, 2021 | BenchmarkingDNN Testing | CodeCode Available | 1 |
| Probing Image-Language Transformers for Verb Understanding | Jun 16, 2021 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| Check It Again: Progressive Visual Question Answering via Visual Entailment | Jun 8, 2021 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training | May 24, 2021 | Image CaptioningMedical Visual Question Answering | CodeCode Available | 1 |
| Multiple Meta-model Quantifying for Medical Visual Question Answering | May 19, 2021 | Medical Visual Question AnsweringMeta-Learning | CodeCode Available | 1 |
| Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules | May 11, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Passage Retrieval for Outside-Knowledge Visual Question Answering | May 9, 2021 | Image CaptioningObject | CodeCode Available | 1 |
| MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | Apr 26, 2021 | Generalized Referring Expression ComprehensionPhrase Grounding | CodeCode Available | 1 |
| GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering | Apr 20, 2021 | Graph Neural NetworkGraph Question Answering | CodeCode Available | 1 |
| Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering | Apr 7, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| MMBERT: Multimodal BERT Pretraining for Improved Medical VQA | Apr 3, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VisQA: X-raying Vision and Language Reasoning in Transformers | Apr 2, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Are Bias Mitigation Techniques for Deep Learning Effective? | Apr 1, 2021 | Deep LearningQuestion Answering | CodeCode Available | 1 |
| Towards General Purpose Vision Systems | Apr 1, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers | Mar 29, 2021 | DecoderImage Segmentation | CodeCode Available | 1 |
| Multi-Modal Answer Validation for Knowledge-Based VQA | Mar 23, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | Feb 18, 2021 | DecoderDocument Image Classification | CodeCode Available | 1 |