Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly Apr 28, 2022 Question Answering Visual Question Answering
Code Code Available 1GRIT: General Robust Image Task Benchmark Apr 28, 2022 Instance Segmentation Keypoint Detection
Code Code Available 1RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning Apr 24, 2022 Human-Object Interaction Detection Object
Code Code Available 1Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering Apr 22, 2022 Question Answering Visual Question Answering
Code Code Available 1Attention in Reasoning: Dataset, Analysis, and Modeling Apr 20, 2022 Question Answering Visual Question Answering
Code Code Available 1SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering Apr 5, 2022 Data Augmentation Question Answering
Code Code Available 1CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations Apr 5, 2022 Explanation Generation Question Answering
Code Code Available 1End-to-end Document Recognition and Understanding with Dessurt Mar 30, 2022 document understanding Visual Question Answering (VQA)
Code Code Available 1Learning to Answer Questions in Dynamic Audio-Visual Scenarios Mar 26, 2022 audio-visual learning Audio-visual Question Answering
Code Code Available 1MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering Mar 17, 2022 Implicit Relations Question Answering
Code Code Available 1NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Mar 9, 2022 Decision Making Explainable artificial intelligence
Code Code Available 1AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant Mar 8, 2022 Visual Question Answering (VQA)
Code Code Available 1IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages Jan 27, 2022 Cross-Modal Retrieval Few-Shot Learning
Code Code Available 1FAVER: Blind Quality Prediction of Variable Frame Rate Videos Jan 5, 2022 Cloud Computing Video Quality Assessment
Code Code Available 1Maintaining Reasoning Consistency in Compositional Visual Question Answering Jan 1, 2022 Question Answering Visual Question Answering
Code Code Available 1LaTr: Layout-Aware Transformer for Scene-Text VQA Dec 23, 2021 Optical Character Recognition (OCR) Question Answering
Code Code Available 1Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation Dec 22, 2021 Common Sense Reasoning Question Answering
Code Code Available 1ScanQA: 3D Question Answering for Spatial Scene Understanding Dec 20, 2021 3D Question Answering (3D-QA) Object
Code Code Available 1Align and Prompt: Video-and-Language Pre-training with Entity Prompts Dec 17, 2021 cross-modal alignment Entity Alignment
Code Code Available 1Distilled Dual-Encoder Model for Vision-Language Understanding Dec 16, 2021 Image to text model
Code Code Available 1KAT: A Knowledge Augmented Transformer for Vision-and-Language Dec 16, 2021 Answer Generation Decoder
Code Code Available 1Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering Dec 14, 2021 Graph Matching Question Answering
Code Code Available 1Dual-Key Multimodal Backdoors for Visual Question Answering Dec 14, 2021 Question Answering Visual Question Answering
Code Code Available 1Change Detection Meets Visual Question Answering Dec 12, 2021 Answer Generation Change Detection
Code Code Available 1Video as Conditional Graph Hierarchy for Multi-Granular Question Answering Dec 12, 2021 Question Answering Video Question Answering
Code Code Available 1MLP Architectures for Vision-and-Language Modeling: An Empirical Study Dec 8, 2021 Language Modeling Language Modelling
Code Code Available 1Debiased Visual Question Answering from Feature and Sample Perspectives Dec 1, 2021 Bias Detection Question Answering
Code Code Available 1Searching the Search Space of Vision Transformer Nov 29, 2021 Neural Architecture Search object-detection
Code Code Available 1Classification-Regression for Chart Comprehension Nov 29, 2021 Chart Question Answering Classification
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture Nov 22, 2021 Handwritten Text Recognition object-detection
Code Code Available 1Florence: A New Foundation Model for Computer Vision Nov 22, 2021 Action Classification Action Recognition
Code Code Available 1Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1An Empirical Study of Training End-to-End Vision-and-Language Transformers Nov 3, 2021 Cross-Modal Retrieval Decoder
Code Code Available 1VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Nov 3, 2021 Image Retrieval Image-text Retrieval
Code Code Available 1ViVQA: Vietnamese Visual Question Answering Nov 1, 2021 Question Answering Vietnamese Visual Question Answering
Code Code Available 1Introspective Distillation for Robust Question Answering Nov 1, 2021 counterfactual Inductive Bias
Code Code Available 1IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning Oct 25, 2021 Arithmetic Reasoning Mathematical Question Answering
Code Code Available 1Label-Descriptive Patterns and Their Application to Characterizing Classification Errors Oct 18, 2021 Descriptive named-entity-recognition
Code Code Available 1A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Oct 16, 2021 Image Captioning Language Modeling
Code Code Available 1Pano-AVQA: Grounded Audio-Visual Question Answering on 360^ Videos Oct 11, 2021 Audio-visual Question Answering Question Answering
Code Code Available 1Coarse-to-Fine Reasoning for Visual Question Answering Oct 6, 2021 Question Answering Visual Question Answering
Code Code Available 1Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering Oct 3, 2021 counterfactual Diagnostic
Code Code Available 1ProTo: Program-Guided Transformer for Program-Guided Tasks Oct 2, 2021 Decision Making Learning to Execute
Code Code Available 1The Spoon Is in the Sink: Assisting Visually Impaired People in the Kitchen Oct 1, 2021 Question Answering Visual Question Answering
Code Code Available 1Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images Oct 1, 2021 Question Answering Visual Question Answering
Code Code Available 1Does Vision-and-Language Pretraining Improve Lexical Grounding? Sep 21, 2021 Question Answering Visual Question Answering
Code Code Available 1ChipQA: No-Reference Video Quality Prediction via Space-Time Chips Sep 17, 2021 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1xGQA: Cross-Lingual Visual Question Answering Sep 13, 2021 Cross-Lingual Transfer Language Modeling
Code Code Available 1An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Sep 10, 2021 Image Captioning Question Answering
Code Code Available 1