Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering Sep 9, 2021 Question Answering Retrieval
Code Code Available 1GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 1WebQA: Multihop and Multimodal QA Sep 1, 2021 Image Retrieval Multimodal Reasoning
Code Code Available 1SimVLM: Simple Visual Language Model Pretraining with Weak Supervision Aug 24, 2021 Image Captioning Language Modeling
Code Code Available 1Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception Aug 19, 2021 Action Recognition Image Quality Assessment
Code Code Available 1X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics Aug 18, 2021 Cross-Modal Retrieval Decoder
Code Code Available 1Task-Oriented Multi-User Semantic Communications for VQA Task Aug 16, 2021 Question Answering Semantic Communication
Code Code Available 1Sparse Continuous Distributions and Fenchel-Young Losses Aug 4, 2021 Audio Classification Question Answering
Code Code Available 1Check It Again:Progressive Visual Question Answering via Visual Entailment Aug 1, 2021 Question Answering Visual Entailment
Code Code Available 1Greedy Gradient Ensemble for Robust Visual Question Answering Jul 27, 2021 Question Answering Visual Question Answering
Code Code Available 1Separating Skills and Concepts for Novel Visual Question Answering Jul 19, 2021 Attribute Contrastive Learning
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering Jul 13, 2021 Navigate Question Answering
Code Code Available 1How Much Can CLIP Benefit Vision-and-Language Tasks? Jul 13, 2021 Question Answering Vision and Language Navigation
Code Code Available 1Zero-shot Visual Question Answering using Knowledge Graph Jul 12, 2021 Knowledge Graphs Question Answering
Code Code Available 1DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering Jul 10, 2021 Graph Attention Question Answering
Code Code Available 1Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering Jul 6, 2021 Active Learning Object Recognition
Code Code Available 1NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions Jun 19, 2021 Question Answering Video Question Answering
Code Code Available 1Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing Jun 19, 2021 Benchmarking DNN Testing
Code Code Available 1Predicting Human Scanpaths in Visual Question Answering Jun 19, 2021 Deep Reinforcement Learning Question Answering
Code Code Available 1RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words Jun 19, 2021 Decoder Image Captioning
Code Code Available 1Probing Image-Language Transformers for Verb Understanding Jun 16, 2021 Image Retrieval Question Answering
Code Code Available 1Check It Again: Progressive Visual Question Answering via Visual Entailment Jun 8, 2021 Question Answering Visual Entailment
Code Code Available 1Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training May 24, 2021 Image Captioning Medical Visual Question Answering
Code Code Available 1Multiple Meta-model Quantifying for Medical Visual Question Answering May 19, 2021 Medical Visual Question Answering Meta-Learning
Code Code Available 1NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions May 18, 2021 Question Answering Video Question Answering
Code Code Available 1Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules May 11, 2021 Question Answering Visual Question Answering
Code Code Available 1Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning May 10, 2021 Arithmetic Reasoning Geometry Problem Solving
Code Code Available 1Passage Retrieval for Outside-Knowledge Visual Question Answering May 9, 2021 Image Captioning Object
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Apr 24, 2021 Image Captioning Object Recognition
Code Code Available 1GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering Apr 20, 2021 Graph Neural Network Graph Question Answering
Code Code Available 1Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering Apr 7, 2021 Question Answering Visual Question Answering
Code Code Available 1MMBERT: Multimodal BERT Pretraining for Improved Medical VQA Apr 3, 2021 Language Modeling Language Modelling
Code Code Available 1VisQA: X-raying Vision and Language Reasoning in Transformers Apr 2, 2021 Question Answering Visual Question Answering
Code Code Available 1Towards General Purpose Vision Systems Apr 1, 2021 Question Answering Visual Question Answering
Code Code Available 1Are Bias Mitigation Techniques for Deep Learning Effective? Apr 1, 2021 Deep Learning Question Answering
Code Code Available 1Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Mar 29, 2021 Decoder Image Segmentation
Code Code Available 1SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events Mar 29, 2021 Autonomous Vehicles Benchmarking
Code Code Available 1On the hidden treasure of dialog in video question answering Mar 26, 2021 Question Answering Video Question Answering
Code Code Available 1Multi-Modal Answer Validation for Knowledge-Based VQA Mar 23, 2021 Question Answering Retrieval
Code Code Available 1Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Feb 18, 2021 Decoder Document Image Classification
Code Code Available 1SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering Feb 18, 2021 Medical Visual Question Answering Question Answering
Code Code Available 1Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 1Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling Feb 11, 2021 Question Answering Retrieval
Code Code Available 1ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Feb 5, 2021 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Unifying Vision-and-Language Tasks via Text Generation Feb 4, 2021 Conditional Text Generation Decoder
Code Code Available 1VisualMRC: Machine Reading Comprehension on Document Images Jan 27, 2021 Machine Reading Comprehension Natural Language Understanding
Code Code Available 1Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images Jan 1, 2021 Attribute Multiple Instance Learning
Code Code Available 1TRAR: Routing the Attention Spans in Transformer for Visual Question Answering Jan 1, 2021 Question Answering Referring Expression
Code Code Available 1