| Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding | Jan 24, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Probing the Role of Positional Information in Vision-Language Models | Jan 16, 2022 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Task Formulation Matters When Learning Continuously: A Case Study in Visual Question Answering | Jan 16, 2022 | Continual LearningIncremental Learning | —Unverified | 0 |
| MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation | Jan 16, 2022 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Retrieving Visual Facts For Few-Shot Visual Question Answering | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| All You May Need for VQA are Image Captions | Jan 16, 2022 | AllImage Captioning | —Unverified | 0 |
| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 |
| On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering | Jan 11, 2022 | POSQuestion Answering | —Unverified | 0 |
| Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training | Jan 11, 2022 | DecoderImage Captioning | —Unverified | 0 |
| COIN: Counterfactual Image Generation for VQA Interpretation | Jan 10, 2022 | counterfactualImage Generation | —Unverified | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 |
| Query and Attention Augmentation for Knowledge-Based Explainable Reasoning | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 1, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? | Dec 27, 2021 | ArticlesMedical Visual Question Answering | —Unverified | 0 |
| Multi-Image Visual Question Answering | Dec 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| General Greedy De-bias Learning | Dec 20, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Task-Oriented Multi-User Semantic Communications | Dec 19, 2021 | Image RetrievalMachine Translation | —Unverified | 0 |
| Understanding Attention for Vision-and-Language Tasks | Dec 17, 2021 | Image GenerationImage Retrieval | —Unverified | 0 |
| 3D Question Answering | Dec 15, 2021 | 3D geometryQuestion Answering | —Unverified | 0 |
| Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection | Dec 13, 2021 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation | Dec 10, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering | Dec 6, 2021 | Language ModellingQuestion Answering | —Unverified | 0 |
| eaVQA: An Experimental Analysis on Visual Question Answering Models | Dec 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Curriculum Learning Effectively Improves Low Data VQA | Dec 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning | Dec 1, 2021 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Scene Graph Generation with Geometric Context | Nov 25, 2021 | Activity RecognitionGraph Generation | —Unverified | 0 |
| A Confidence-Based Interface for Neuro-Symbolic Visual Question Answering | Nov 21, 2021 | Question AnsweringTranslation | —Unverified | 0 |
| Medical Visual Question Answering: A Survey | Nov 19, 2021 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| UFO: A UniFied TransfOrmer for Vision-Language Representation Learning | Nov 19, 2021 | Image CaptioningImage-text matching | —Unverified | 0 |
| Achieving Human Parity on Visual Question Answering | Nov 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities | Nov 16, 2021 | ArticlesFace Recognition | CodeCode Available | 0 |
| Language bias in Visual Question Answering: A Survey and Taxonomy | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation | Nov 16, 2021 | Image CaptioningKnowledge Distillation | —Unverified | 0 |
| Question-Led Semantic Structure Enhanced Attentions for VQA | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Document AI: Benchmarks, Models and Applications | Nov 16, 2021 | Deep LearningDocument AI | —Unverified | 0 |
| Breaking Down Questions for Outside-Knowledge Visual Question Answering | Nov 16, 2021 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Nov 16, 2021 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Visual Question Answering based on Formal Logic | Nov 8, 2021 | Formal LogicQuestion Answering | —Unverified | 0 |
| Diversity and Consistency: Exploring Visual Question-Answer Pair Generation | Nov 1, 2021 | DiversityQuestion Answering | —Unverified | 0 |
| CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization | Nov 1, 2021 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 |
| MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering | Nov 1, 2021 | multimodal interactionMultiple-choice | CodeCode Available | 0 |
| Perceptual Score: What Data Modalities Does Your Model Perceive? | Oct 27, 2021 | Question AnsweringVisual Dialog | CodeCode Available | 0 |
| Alignment Attention by Matching Key and Query Distributions | Oct 25, 2021 | Graph AttentionQuestion Answering | CodeCode Available | 0 |