| Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task | Aug 24, 2022 | Continual LearningQuestion Answering | CodeCode Available | 1 |
| FashionVQA: A Domain-Specific Visual Question Answering System | Aug 24, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Bidirectional Contrastive Split Learning for Visual Question Answering | Aug 24, 2022 | Adversarial AttackBackdoor Attack | —Unverified | 0 |
| How good are deep models in understanding the generated images? | Aug 23, 2022 | ObjectObject Recognition | —Unverified | 0 |
| Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks | Aug 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 0 |
| VLMAE: Vision-Language Masked Autoencoder | Aug 19, 2022 | Image-text RetrievalLanguage Modeling | —Unverified | 0 |
| ILLUME: Rationalizing Vision-Language Models through Human Interactions | Aug 17, 2022 | Image CaptioningQuestion Answering | CodeCode Available | 0 |
| Understanding Attention for Vision-and-Language Tasks | Aug 17, 2022 | Image GenerationImage Retrieval | CodeCode Available | 0 |
| Aesthetic Visual Question Answering of Photographs | Aug 10, 2022 | Question AnsweringSentiment Analysis | —Unverified | 0 |
| CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning | Aug 10, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding | Aug 5, 2022 | Image RetrievalQuestion Answering | CodeCode Available | 1 |
| Generative Bias for Robust Visual Question Answering | Aug 1, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 1 |
| Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base | Jul 27, 2022 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection | Jul 26, 2022 | DecoderKnowledge Graphs | CodeCode Available | 1 |
| Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering | Jul 26, 2022 | Causal InferenceQuestion Answering | CodeCode Available | 1 |
| WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models | Jul 25, 2022 | Common Sense ReasoningGeneral Knowledge | CodeCode Available | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 |
| Is GPT-3 all you need for Visual Question Answering in Cultural Heritage? | Jul 25, 2022 | AllQuestion Answering | —Unverified | 0 |
| Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem | Jul 24, 2022 | DiagnosticQuestion Answering | —Unverified | 0 |
| Semantic-aware Modular Capsule Routing for Visual Question Answering | Jul 21, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Rethinking Data Augmentation for Robust Visual Question Answering | Jul 18, 2022 | Data AugmentationKnowledge Distillation | CodeCode Available | 1 |
| ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities | Jul 11, 2022 | ArticlesFew-Shot Learning | CodeCode Available | 1 |
| OVQA: A Clinically Generated Visual Question Answering Dataset | Jul 7, 2022 | BenchmarkingMedical Visual Question Answering | —Unverified | 0 |
| Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning | Jul 6, 2022 | DiagnosticMulti-Task Learning | CodeCode Available | 0 |
| Weakly Supervised Grounding for VQA in Vision-Language Transformers | Jul 5, 2022 | Question AnsweringRepresentation Learning | CodeCode Available | 1 |