| Robustness through Data Augmentation Loss Consistency | Oct 21, 2021 | Multi-domain Dialogue State TrackingVisual Question Answering | CodeCode Available | 0 |
| Single-Modal Entropy based Active Learning for Visual Question Answering | Oct 21, 2021 | Active LearningQuestion Answering | —Unverified | 0 |
| Towards Language-guided Visual Recognition via Dynamic Convolutions | Oct 17, 2021 | Question AnsweringReferring Expression | CodeCode Available | 0 |
| xGQA: Cross-Lingual Visual Question Answering | Oct 16, 2021 | Cross-Lingual TransferLanguage Modeling | —Unverified | 0 |
| MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants | Oct 13, 2021 | intent-classificationIntent Classification | —Unverified | 0 |
| Improving Users' Mental Model with Attention-directed Counterfactual Edits | Oct 13, 2021 | counterfactualQuestion Answering | —Unverified | 0 |
| Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking | Oct 11, 2021 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Asking questions on handwritten document collections | Oct 2, 2021 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Breaking Down Questions for Outside-Knowledge VQA | Sep 29, 2021 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| Variational Disentangled Attention for Regularized Visual Dialog | Sep 29, 2021 | Question AnsweringVisual Dialog | —Unverified | 0 |
| Crossformer: Transformer with Alternated Cross-Layer Guidance | Sep 29, 2021 | Inductive BiasMachine Translation | —Unverified | 0 |
| How Much Can CLIP Benefit Vision-and-Language Tasks? | Sep 29, 2021 | Question AnsweringVisual Entailment | —Unverified | 0 |
| Measuring CLEVRness: Black-box Testing of Visual Reasoning Models | Sep 29, 2021 | BenchmarkingDiagnostic | —Unverified | 0 |
| VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering | Sep 27, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Multimodal Integration of Human-Like Attention in Visual Question Answering | Sep 27, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| How to find a good image-text embedding for remote sensing visual question answering? | Sep 24, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering | Sep 15, 2021 | Image CaptioningKnowledge Graphs | CodeCode Available | 0 |
| Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering | Sep 13, 2021 | Data AugmentationQuestion Answering | CodeCode Available | 0 |
| Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation | Sep 10, 2021 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| TxT: Crossmodal End-to-End Learning with Transformers | Sep 9, 2021 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Improved RAMEN: Towards Domain Generalization for Visual Question Answering | Sep 6, 2021 | Domain GeneralizationQuestion Answering | CodeCode Available | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering | Aug 28, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Auto-Parsing Network for Image Captioning and Visual Question Answering | Aug 24, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling | Aug 20, 2021 | Data AblationOptical Character Recognition | —Unverified | 0 |