| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | May 24, 2022 | Image CaptioningOut-of-Distribution Generalization | —Unverified | 0 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 |
| VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering | May 23, 2022 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Gender and Racial Bias in Visual Question Answering Datasets | May 17, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Neuro-Symbolic ASP Pipeline for Visual Question Answering | May 16, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures | May 10, 2022 | AutoMLBIG-bench Machine Learning | —Unverified | 0 |
| Joint learning of object graph and relation graph for visual question answering | May 9, 2022 | AttributeGraph Neural Network | —Unverified | 0 |
| From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data | May 6, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning | May 5, 2022 | Multi-Task LearningQuestion Answering | CodeCode Available | 0 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 |
| Vision-Language Pretraining: Current Trends and the Future | May 1, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 |
| Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | Apr 22, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 |
| Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | Apr 15, 2022 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Question-Driven Graph Fusion Network For Visual Question Answering | Apr 3, 2022 | Graph AttentionObject | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SimVQA: Exploring Simulated Environments for Visual Question Answering | Mar 31, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers | Mar 30, 2022 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |
| Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering | Mar 24, 2022 | GPUQuestion Answering | CodeCode Available | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Can you even tell left from right? Presenting a new challenge for VQA | Mar 15, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment | Mar 14, 2022 | parameter-efficient fine-tuningQuestion Answering | —Unverified | 0 |
| Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation | Mar 12, 2022 | Image CaptioningKnowledge Distillation | —Unverified | 0 |
| Barlow constrained optimization for Visual Question Answering | Mar 7, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Modeling Coreference Relations in Visual Dialog | Mar 6, 2022 | Question AnsweringVisual Dialog | —Unverified | 0 |
| Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering | Mar 6, 2022 | Graph AttentionQuestion Answering | CodeCode Available | 0 |
| Recent, rapid advancement in visual question answering architecture: a review | Mar 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| On Modality Bias Recognition and Reduction | Feb 25, 2022 | Action RecognitionMulti-modal Classification | CodeCode Available | 0 |
| Joint Answering and Explanation for Visual Commonsense Reasoning | Feb 25, 2022 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |
| Measuring CLEVRness: Blackbox testing of Visual Reasoning Models | Feb 24, 2022 | BenchmarkingDiagnostic | —Unverified | 0 |
| OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics | Feb 21, 2022 | BIG-bench Machine LearningGraph Generation | CodeCode Available | 0 |
| Privacy Preserving Visual Question Answering | Feb 15, 2022 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Delving Deeper into Cross-lingual Visual Question Answering | Feb 15, 2022 | Inductive BiasQuestion Answering | CodeCode Available | 0 |
| An experimental study of the vision-bottleneck in VQA | Feb 14, 2022 | ObjectQuestion Answering | —Unverified | 0 |
| Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? | Feb 9, 2022 | Open-Domain Question AnsweringQuestion Answering | —Unverified | 0 |
| NEWSKVQA: Knowledge-Aware News Video Question Answering | Feb 8, 2022 | Common Sense ReasoningManagement | —Unverified | 0 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Feb 4, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Compositionality as Lexical Symmetry | Jan 30, 2022 | Data AugmentationInductive Bias | CodeCode Available | 0 |
| Transformer Module Networks for Systematic Generalization in Visual Question Answering | Jan 27, 2022 | Question AnsweringSystematic Generalization | CodeCode Available | 0 |
| Learning to Compose Diversified Prompts for Image Emotion Classification | Jan 26, 2022 | ClassificationEmotion Classification | —Unverified | 0 |
| MGA-VQA: Multi-Granularity Alignment for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering | Jan 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |