| Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning | May 31, 2022 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 |
| Visual Superordinate Abstraction for Robust Concept Learning | May 28, 2022 | AttributeQuestion Answering | —Unverified | 0 |
| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | May 24, 2022 | Image CaptioningOut-of-Distribution Generalization | —Unverified | 0 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 |
| VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering | May 23, 2022 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Gender and Racial Bias in Visual Question Answering Datasets | May 17, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Neuro-Symbolic ASP Pipeline for Visual Question Answering | May 16, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures | May 10, 2022 | AutoMLBIG-bench Machine Learning | —Unverified | 0 |
| Learning to Answer Visual Questions from Web Videos | May 10, 2022 | Dataset GenerationQuestion Answering | CodeCode Available | 1 |
| Joint learning of object graph and relation graph for visual question answering | May 9, 2022 | AttributeGraph Neural Network | —Unverified | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data | May 6, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning | May 5, 2022 | Multi-Task LearningQuestion Answering | CodeCode Available | 0 |
| Declaration-based Prompt Tuning for Visual Question Answering | May 5, 2022 | Image-text matchingLanguage Modeling | CodeCode Available | 1 |
| All You May Need for VQA are Image Captions | May 4, 2022 | AllImage Captioning | CodeCode Available | 3 |
| CoCa: Contrastive Captioners are Image-Text Foundation Models | May 4, 2022 | Action ClassificationDecoder | CodeCode Available | 1 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 |
| Vision-Language Pretraining: Current Trends and the Future | May 1, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 |
| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 |
| Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly | Apr 28, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| GRIT: General Robust Image Task Benchmark | Apr 28, 2022 | Instance SegmentationKeypoint Detection | CodeCode Available | 1 |
| Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | Apr 22, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering | Apr 22, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Attention in Reasoning: Dataset, Analysis, and Modeling | Apr 20, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 |
| Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | Apr 15, 2022 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations | Apr 5, 2022 | Explanation GenerationQuestion Answering | CodeCode Available | 1 |
| SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering | Apr 5, 2022 | Data AugmentationQuestion Answering | CodeCode Available | 1 |
| Question-Driven Graph Fusion Network For Visual Question Answering | Apr 3, 2022 | Graph AttentionObject | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SimVQA: Exploring Simulated Environments for Visual Question Answering | Mar 31, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers | Mar 30, 2022 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |
| Learning to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 26, 2022 | audio-visual learningAudio-visual Question Answering | CodeCode Available | 1 |
| A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration | Mar 25, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering | Mar 24, 2022 | GPUQuestion Answering | CodeCode Available | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering | Mar 17, 2022 | Implicit RelationsQuestion Answering | CodeCode Available | 1 |
| Can you even tell left from right? Presenting a new challenge for VQA | Mar 15, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment | Mar 14, 2022 | parameter-efficient fine-tuningQuestion Answering | —Unverified | 0 |
| Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation | Mar 12, 2022 | Image CaptioningKnowledge Distillation | —Unverified | 0 |
| Barlow constrained optimization for Visual Question Answering | Mar 7, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering | Mar 6, 2022 | Graph AttentionQuestion Answering | CodeCode Available | 0 |
| Modeling Coreference Relations in Visual Dialog | Mar 6, 2022 | Question AnsweringVisual Dialog | —Unverified | 0 |