| Measuring CLEVRness: Black-box Testing of Visual Reasoning Models | Sep 29, 2021 | BenchmarkingDiagnostic | —Unverified | 0 |
| Crossformer: Transformer with Alternated Cross-Layer Guidance | Sep 29, 2021 | Inductive BiasMachine Translation | —Unverified | 0 |
| VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering | Sep 27, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multimodal Integration of Human-Like Attention in Visual Question Answering | Sep 27, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| How to find a good image-text embedding for remote sensing visual question answering? | Sep 24, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Does Vision-and-Language Pretraining Improve Lexical Grounding? | Sep 21, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering | Sep 15, 2021 | Image CaptioningKnowledge Graphs | CodeCode Available | 0 |
| xGQA: Cross-Lingual Visual Question Answering | Sep 13, 2021 | Cross-Lingual TransferLanguage Modeling | CodeCode Available | 1 |
| Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering | Sep 13, 2021 | Data AugmentationQuestion Answering | CodeCode Available | 0 |
| An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA | Sep 10, 2021 | Image CaptioningQuestion Answering | CodeCode Available | 1 |
| Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation | Sep 10, 2021 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering | Sep 9, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| TxT: Crossmodal End-to-End Learning with Transformers | Sep 9, 2021 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Improved RAMEN: Towards Domain Generalization for Visual Question Answering | Sep 6, 2021 | Domain GeneralizationQuestion Answering | CodeCode Available | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| WebQA: Multihop and Multimodal QA | Sep 1, 2021 | Image RetrievalMultimodal Reasoning | CodeCode Available | 1 |
| On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering | Aug 28, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Auto-Parsing Network for Image Captioning and Visual Question Answering | Aug 24, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | Aug 24, 2021 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling | Aug 20, 2021 | Data AblationOptical Character Recognition | —Unverified | 0 |
| X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics | Aug 18, 2021 | Cross-Modal RetrievalDecoder | CodeCode Available | 1 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Task-Oriented Multi-User Semantic Communications for VQA Task | Aug 16, 2021 | Question AnsweringSemantic Communication | CodeCode Available | 1 |
| BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis | Aug 10, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Sparse Continuous Distributions and Fenchel-Young Losses | Aug 4, 2021 | Audio ClassificationQuestion Answering | CodeCode Available | 1 |
| LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) | Aug 1, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering | Aug 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Check It Again:Progressive Visual Question Answering via Visual Entailment | Aug 1, 2021 | Question AnsweringVisual Entailment | CodeCode Available | 1 |
| Towards Visual Question Answering on Pathology Images | Aug 1, 2021 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Greedy Gradient Ensemble for Robust Visual Question Answering | Jul 27, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering | Jul 24, 2021 | AttributeOut-of-Distribution Generalization | CodeCode Available | 0 |
| Separating Skills and Concepts for Novel Visual Question Answering | Jul 19, 2021 | AttributeContrastive Learning | CodeCode Available | 1 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Jul 13, 2021 | Question AnsweringVision and Language Navigation | CodeCode Available | 1 |
| Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering | Jul 13, 2021 | NavigateQuestion Answering | CodeCode Available | 1 |
| Zero-shot Visual Question Answering using Knowledge Graph | Jul 12, 2021 | Knowledge GraphsQuestion Answering | CodeCode Available | 1 |
| MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering | Jul 7, 2021 | Medical Visual Question AnsweringMissing Labels | —Unverified | 0 |
| Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering | Jul 6, 2021 | Active LearningObject Recognition | CodeCode Available | 1 |
| Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory | Jul 4, 2021 | Question AnsweringScene Understanding | CodeCode Available | 0 |
| Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs | Jun 28, 2021 | Question AnsweringTask 2 | —Unverified | 0 |
| Multimodal Few-Shot Learning with Frozen Language Models | Jun 25, 2021 | Few-Shot LearningLanguage Modeling | —Unverified | 0 |
| Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training | Jun 25, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 |
| A Picture May Be Worth a Hundred Words for Visual Question Answering | Jun 25, 2021 | Data AugmentationDescriptive | —Unverified | 0 |
| Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing | Jun 19, 2021 | BenchmarkingDNN Testing | CodeCode Available | 1 |
| Predicting Human Scanpaths in Visual Question Answering | Jun 19, 2021 | Deep Reinforcement LearningQuestion Answering | CodeCode Available | 1 |
| RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words | Jun 19, 2021 | DecoderImage Captioning | CodeCode Available | 1 |
| VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis | Jun 19, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Probing Image-Language Transformers for Verb Understanding | Jun 16, 2021 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| How Modular Should Neural Module Networks Be for Systematic Generalization? | Jun 15, 2021 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 |