Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation Mar 12, 2022 Image Captioning Knowledge Distillation
— Unverified 0NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Mar 9, 2022 Decision Making Explainable artificial intelligence
Code Code Available 1AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant Mar 8, 2022 Visual Question Answering (VQA)
Code Code Available 1Barlow constrained optimization for Visual Question Answering Mar 7, 2022 Question Answering Visual Question Answering
Code Code Available 0Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering Mar 6, 2022 Graph Attention Question Answering
Code Code Available 0Modeling Coreference Relations in Visual Dialog Mar 6, 2022 Question Answering Visual Dialog
— Unverified 0Recent, rapid advancement in visual question answering architecture: a review Mar 2, 2022 Question Answering Visual Question Answering
— Unverified 0Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment Mar 1, 2022 Retrieval Sentence
— Unverified 0Joint Answering and Explanation for Visual Commonsense Reasoning Feb 25, 2022 Knowledge Distillation Question Answering
Code Code Available 0On Modality Bias Recognition and Reduction Feb 25, 2022 Action Recognition Multi-modal Classification
Code Code Available 0Measuring CLEVRness: Blackbox testing of Visual Reasoning Models Feb 24, 2022 Benchmarking Diagnostic
— Unverified 0Vision-Language Pre-Training with Triple Contrastive Learning Feb 21, 2022 Contrastive Learning cross-modal alignment
Code Code Available 2OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics Feb 21, 2022 BIG-bench Machine Learning Graph Generation
Code Code Available 0RankDVQA: Deep VQA based on Ranking-inspired Hybrid Training Feb 17, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Privacy Preserving Visual Question Answering Feb 15, 2022 Privacy Preserving Question Answering
— Unverified 0Delving Deeper into Cross-lingual Visual Question Answering Feb 15, 2022 Inductive Bias Question Answering
Code Code Available 0An experimental study of the vision-bottleneck in VQA Feb 14, 2022 Object Question Answering
— Unverified 0Can Open Domain Question Answering Systems Answer Visual Knowledge Questions? Feb 9, 2022 Open-Domain Question Answering Question Answering
— Unverified 0NEWSKVQA: Knowledge-Aware News Video Question Answering Feb 8, 2022 Common Sense Reasoning Management
— Unverified 0OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Feb 7, 2022 Image Captioning image-classification
Code Code Available 0Webly Supervised Concept Expansion for General Purpose Vision Models Feb 4, 2022 Human-Object Interaction Detection Image Retrieval
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Feb 4, 2022 Question Answering Visual Question Answering
Code Code Available 0Compositionality as Lexical Symmetry Jan 30, 2022 Data Augmentation Inductive Bias
Code Code Available 0BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Jan 28, 2022 Image Captioning Image-text matching
Code Code Available 5IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages Jan 27, 2022 Cross-Modal Retrieval Few-Shot Learning
Code Code Available 1Transformer Module Networks for Systematic Generalization in Visual Question Answering Jan 27, 2022 Question Answering Systematic Generalization
Code Code Available 0Learning to Compose Diversified Prompts for Image Emotion Classification Jan 26, 2022 Classification Emotion Classification
— Unverified 0MGA-VQA: Multi-Granularity Alignment for Visual Question Answering Jan 25, 2022 Question Answering Visual Question Answering
— Unverified 0SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering Jan 25, 2022 Question Answering Visual Question Answering
— Unverified 0Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding Jan 24, 2022 Question Answering Question Generation
— Unverified 0KAT: A Knowledge Augmented Transformer for Vision-and-Language Jan 16, 2022 Answer Generation Decoder
— Unverified 0Retrieving Visual Facts For Few-Shot Visual Question Answering Jan 16, 2022 Language Modeling Language Modelling
— Unverified 0MANGO: Enhancing the Robustness of VQA Models via Adversarial Noise Generation Jan 16, 2022 Logical Reasoning Question Answering
— Unverified 0All You May Need for VQA are Image Captions Jan 16, 2022 All Image Captioning
— Unverified 0Task Formulation Matters When Learning Continuously: A Case Study in Visual Question Answering Jan 16, 2022 Continual Learning Incremental Learning
— Unverified 0Probing the Role of Positional Information in Vision-Language Models Jan 16, 2022 Contrastive Learning Image-text matching
— Unverified 0CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks Jan 15, 2022 Question Answering Visual Commonsense Reasoning
— Unverified 0A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering Jan 14, 2022 Generative Question Answering Image to text
— Unverified 0Towards Automated Error Analysis: Learning to Characterize Errors Jan 13, 2022 Common Sense Reasoning Meta-Learning
— Unverified 0On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering Jan 11, 2022 POS Question Answering
— Unverified 0Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training Jan 11, 2022 Decoder Image Captioning
— Unverified 0COIN: Counterfactual Image Generation for VQA Interpretation Jan 10, 2022 counterfactual Image Generation
— Unverified 0FAVER: Blind Quality Prediction of Variable Frame Rate Videos Jan 5, 2022 Cloud Computing Video Quality Assessment
Code Code Available 1Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety Jan 4, 2022 Decoder Deep Learning
— Unverified 0V-Doc: Visual Questions Answers With Documents Jan 1, 2022 Question Answering Question Generation
— Unverified 0Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering Jan 1, 2022 Generative Question Answering Image to text
— Unverified 0Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Jan 1, 2022 Question Answering Visual Question Answering
Code Code Available 0Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture Jan 1, 2022 Question Answering Visual Question Answering
— Unverified 0Maintaining Reasoning Consistency in Compositional Visual Question Answering Jan 1, 2022 Question Answering Visual Question Answering
Code Code Available 1Multi-Image Visual Question Answering Dec 27, 2021 Question Answering Visual Question Answering
Code Code Available 0