Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images Jan 1, 2021 Attribute Multiple Instance Learning
Code Code Available 1TRAR: Routing the Attention Spans in Transformer for Visual Question Answering Jan 1, 2021 Question Answering Referring Expression
Code Code Available 1Detecting Hate Speech in Multi-modal Memes Dec 29, 2020 Binary Classification Hate Speech Detection
Code Code Available 1Overcoming Language Priors with Self-supervised Learning for Visual Question Answering Dec 17, 2020 Question Answering Self-Supervised Learning
Code Code Available 1Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding Dec 14, 2020 Question Answering Visual Question Answering
Code Code Available 1CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions Dec 8, 2020 counterfactual Descriptive
Code Code Available 1TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Dec 8, 2020 Caption Generation Language Modeling
Code Code Available 1FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding Dec 5, 2020 image-classification Image Classification
Code Code Available 1Just Ask: Learning to Answer Questions from Millions of Narrated Videos Dec 1, 2020 Question Answering Question Generation
Code Code Available 1Point and Ask: Incorporating Pointing into Visual Question Answering Nov 27, 2020 Question Answering Visual Question Answering
Code Code Available 1Patch-VQ: 'Patching Up' the Video Quality Problem Nov 27, 2020 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Transformation Driven Visual Reasoning Nov 26, 2020 Attribute Triplet
Code Code Available 1Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention Nov 23, 2020 Classification General Classification
Code Code Available 1LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering Nov 21, 2020 Answer Generation Question Answering
Code Code Available 1Disentangling 3D Prototypical Networks For Few-Shot Concept Learning Nov 6, 2020 3D geometry 3D Object Detection
Code Code Available 1ConceptBert: Concept-Aware Representation for Visual Question Answering Nov 1, 2020 Common Sense Reasoning Question Answering
Code Code Available 1Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering Nov 1, 2020 Contrastive Learning counterfactual
Code Code Available 1MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering Oct 27, 2020 Diagnostic Question Answering
Code Code Available 1ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction Oct 26, 2020 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering Oct 24, 2020 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies Oct 21, 2020 Question Answering Visual Question Answering
Code Code Available 1Bayesian Attention Modules Oct 20, 2020 Image Captioning Machine Translation
Code Code Available 1Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs Oct 15, 2020 Language Modeling Language Modelling
Code Code Available 1Contrast and Classify: Training Robust VQA Models Oct 13, 2020 Contrastive Learning Data Augmentation
Code Code Available 1X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers Sep 23, 2020 Image Captioning Image Generation
Code Code Available 1MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering Sep 18, 2020 Out-of-Distribution Generalization Question Answering
Code Code Available 1A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports Sep 3, 2020 Image-text Retrieval Medical Visual Question Answering
Code Code Available 1A Dataset and Baselines for Visual Question Answering on Art Aug 28, 2020 Question Answering Question Generation
Code Code Available 1DeVLBert: Learning Deconfounded Visio-Linguistic Representations Aug 16, 2020 Image Retrieval Question Answering
Code Code Available 1Spatially Aware Multimodal Transformers for TextVQA Jul 23, 2020 Optical Character Recognition (OCR) Spatial Reasoning
Code Code Available 1Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering Jul 19, 2020 Adversarial Attack Data Augmentation
Code Code Available 1Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions Jul 17, 2020 Question Answering Video Question Answering
Code Code Available 1Learning to Discretely Compose Reasoning Module Networks for Video Captioning Jul 17, 2020 Decoder Question Answering
Code Code Available 1DocVQA: A Dataset for VQA on Document Images Jul 1, 2020 Question Answering Reading Comprehension
Code Code Available 1Visual Question Generation from Radiology Images Jul 1, 2020 Image Augmentation Question Generation
Code Code Available 1Ontology-guided Semantic Composition for Zero-Shot Learning Jun 30, 2020 image-classification Image Classification
Code Code Available 1Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 1Sparse and Continuous Attention Mechanisms Jun 12, 2020 Machine Translation Question Answering
Code Code Available 1Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning Jun 11, 2020 Question Answering Reinforcement Learning (RL)
Code Code Available 1Large-Scale Adversarial Training for Vision-and-Language Representation Learning Jun 11, 2020 Image-text Retrieval Question Answering
Code Code Available 1Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? Jun 9, 2020 Question Answering Visual Question Answering
Code Code Available 1Counterfactual VQA: A Cause-Effect Look at Language Bias Jun 8, 2020 Causal Inference counterfactual
Code Code Available 1Attention-Based Context Aware Reasoning for Situation Recognition Jun 1, 2020 Action Recognition Fine-grained Action Recognition
Code Code Available 1Structured Multimodal Attentions for TextVQA Jun 1, 2020 Graph Attention Optical Character Recognition (OCR)
Code Code Available 1UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content May 29, 2020 Benchmarking feature selection
Code Code Available 1Cross-Modality Relevance for Reasoning on Language and Vision May 12, 2020 Question Answering Visual Question Answering
Code Code Available 1COBRA: Contrastive Bi-Modal Representation Algorithm May 7, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 1Dynamic Language Binding in Relational Visual Reasoning Apr 30, 2020 Object Question Answering
Code Code Available 1Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 1Visual Grounding Methods for VQA are Working for the Wrong Reasons! Apr 12, 2020 Question Answering Visual Grounding
Code Code Available 1