| Recent, rapid advancement in visual question answering architecture: a review | Mar 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| On Modality Bias Recognition and Reduction | Feb 25, 2022 | Action RecognitionMulti-modal Classification | CodeCode Available | 0 |
| Joint Answering and Explanation for Visual Commonsense Reasoning | Feb 25, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Measuring CLEVRness: Blackbox testing of Visual Reasoning Models | Feb 24, 2022 | BenchmarkingDiagnostic | —Unverified | 0 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 |
| Vision-Language Pre-Training with Triple Contrastive Learning | Feb 21, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 2 |
| Privacy Preserving Visual Question Answering | Feb 15, 2022 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Delving Deeper into Cross-lingual Visual Question Answering | Feb 15, 2022 | Inductive BiasQuestion Answering | CodeCode Available | 0 |
| An experimental study of the vision-bottleneck in VQA | Feb 14, 2022 | ObjectQuestion Answering | —Unverified | 0 |
| Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? | Feb 9, 2022 | Open-Domain Question AnsweringQuestion Answering | —Unverified | 0 |
| NEWSKVQA: Knowledge-Aware News Video Question Answering | Feb 8, 2022 | Common Sense ReasoningManagement | —Unverified | 0 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Feb 4, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Compositionality as Lexical Symmetry | Jan 30, 2022 | Data AugmentationInductive Bias | CodeCode Available | 0 |
| Transformer Module Networks for Systematic Generalization in Visual Question Answering | Jan 27, 2022 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages | Jan 27, 2022 | Cross-Modal RetrievalFew-Shot Learning | CodeCode Available | 1 |
| Learning to Compose Diversified Prompts for Image Emotion Classification | Jan 26, 2022 | ClassificationEmotion Classification | —Unverified | 0 |
| MGA-VQA: Multi-Granularity Alignment for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding | Jan 24, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation | Jan 16, 2022 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Task Formulation Matters When Learning Continuously: A Case Study in Visual Question Answering | Jan 16, 2022 | Continual LearningIncremental Learning | —Unverified | 0 |
| Retrieving Visual Facts For Few-Shot Visual Question Answering | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Probing the Role of Positional Information in Vision-Language Models | Jan 16, 2022 | Contrastive LearningImage-text matching | —Unverified | 0 |
| All You May Need for VQA are Image Captions | Jan 16, 2022 | AllImage Captioning | —Unverified | 0 |
| CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks | Jan 15, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 14, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 |
| On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering | Jan 11, 2022 | POSQuestion Answering | —Unverified | 0 |
| Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training | Jan 11, 2022 | DecoderImage Captioning | —Unverified | 0 |
| COIN: Counterfactual Image Generation for VQA Interpretation | Jan 10, 2022 | counterfactualImage Generation | —Unverified | 0 |
| Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety | Jan 4, 2022 | DecoderDeep Learning | —Unverified | 0 |
| Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering | Jan 1, 2022 | Generative Question AnsweringImage to text | —Unverified | 0 |
| Maintaining Reasoning Consistency in Compositional Visual Question Answering | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| V-Doc: Visual Questions Answers With Documents | Jan 1, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Query and Attention Augmentation for Knowledge-Based Explainable Reasoning | Jan 1, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? | Dec 27, 2021 | ArticlesMedical Visual Question Answering | —Unverified | 0 |
| Multi-Image Visual Question Answering | Dec 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| LaTr: Layout-Aware Transformer for Scene-Text VQA | Dec 23, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 1 |
| Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation | Dec 22, 2021 | Common Sense ReasoningQuestion Answering | CodeCode Available | 1 |
| General Greedy De-bias Learning | Dec 20, 2021 | image-classificationImage Classification | CodeCode Available | 0 |
| Task-Oriented Multi-User Semantic Communications | Dec 19, 2021 | Image RetrievalMachine Translation | —Unverified | 0 |
| Understanding Attention for Vision-and-Language Tasks | Dec 17, 2021 | Image GenerationImage Retrieval | —Unverified | 0 |
| Distilled Dual-Encoder Model for Vision-Language Understanding | Dec 16, 2021 | Image to textmodel | CodeCode Available | 1 |
| 3D Question Answering | Dec 15, 2021 | 3D geometryQuestion Answering | —Unverified | 0 |
| Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering | Dec 14, 2021 | Graph MatchingQuestion Answering | CodeCode Available | 1 |
| Dual-Key Multimodal Backdoors for Visual Question Answering | Dec 14, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection | Dec 13, 2021 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| Change Detection Meets Visual Question Answering | Dec 12, 2021 | Answer GenerationChange Detection | CodeCode Available | 1 |