| V-Doc : Visual questions answers with Documents | May 27, 2022 | Question AnsweringQuestion Generation | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | May 24, 2022 | Image CaptioningOut-of-Distribution Generalization | —Unverified | 0 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 |
| VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering | May 23, 2022 | Knowledge GraphsQuestion Answering | —Unverified | 0 |
| Gender and Racial Bias in Visual Question Answering Datasets | May 17, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| A Neuro-Symbolic ASP Pipeline for Visual Question Answering | May 16, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures | May 10, 2022 | AutoMLBIG-bench Machine Learning | —Unverified | 0 |
| Joint learning of object graph and relation graph for visual question answering | May 9, 2022 | AttributeGraph Neural Network | —Unverified | 0 |
| From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data | May 6, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning | May 5, 2022 | Multi-Task LearningQuestion Answering | CodeCode Available | 0 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 |
| Vision-Language Pretraining: Current Trends and the Future | May 1, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| ViLMedic: a framework for research at the intersection of vision and language in medical AI | May 1, 2022 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 |
| Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks | Apr 22, 2022 | Question AnsweringVisual Commonsense Reasoning | —Unverified | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| Attention Mechanism based Cognition-level Scene Understanding | Apr 17, 2022 | Question AnsweringScene Understanding | —Unverified | 0 |
| Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | Apr 15, 2022 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Question-Driven Graph Fusion Network For Visual Question Answering | Apr 3, 2022 | Graph AttentionObject | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SimVQA: Exploring Simulated Environments for Visual Question Answering | Mar 31, 2022 | Data AugmentationDiversity | —Unverified | 0 |
| VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers | Mar 30, 2022 | Question AnsweringVisual Commonsense Reasoning | CodeCode Available | 0 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |