Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? Dec 27, 2021 Articles Medical Visual Question Answering
— Unverified 0LaTr: Layout-Aware Transformer for Scene-Text VQA Dec 23, 2021 Optical Character Recognition (OCR) Question Answering
Code Code Available 1Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation Dec 22, 2021 Common Sense Reasoning Question Answering
Code Code Available 1ScanQA: 3D Question Answering for Spatial Scene Understanding Dec 20, 2021 3D Question Answering (3D-QA) Object
Code Code Available 1General Greedy De-bias Learning Dec 20, 2021 image-classification Image Classification
Code Code Available 0Task-Oriented Multi-User Semantic Communications Dec 19, 2021 Image Retrieval Machine Translation
— Unverified 0Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey Dec 18, 2021 Data Augmentation Few-Shot Learning
— Unverified 0Understanding Attention for Vision-and-Language Tasks Dec 17, 2021 Image Generation Image Retrieval
— Unverified 0Align and Prompt: Video-and-Language Pre-training with Entity Prompts Dec 17, 2021 cross-modal alignment Entity Alignment
Code Code Available 1KAT: A Knowledge Augmented Transformer for Vision-and-Language Dec 16, 2021 Answer Generation Decoder
Code Code Available 1Distilled Dual-Encoder Model for Vision-Language Understanding Dec 16, 2021 Image to text model
Code Code Available 13D Question Answering Dec 15, 2021 3D geometry Question Answering
— Unverified 0Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering Dec 14, 2021 Graph Matching Question Answering
Code Code Available 1Dual-Key Multimodal Backdoors for Visual Question Answering Dec 14, 2021 Question Answering Visual Question Answering
Code Code Available 1Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection Dec 13, 2021 Common Sense Reasoning Knowledge Graph Embeddings
— Unverified 0Video as Conditional Graph Hierarchy for Multi-Granular Question Answering Dec 12, 2021 Question Answering Video Question Answering
Code Code Available 1Change Detection Meets Visual Question Answering Dec 12, 2021 Answer Generation Change Detection
Code Code Available 1Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Dec 10, 2021 Image-text matching Image-text Retrieval
— Unverified 0MLP Architectures for Vision-and-Language Modeling: An Empirical Study Dec 8, 2021 Language Modeling Language Modelling
Code Code Available 1MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering Dec 6, 2021 Language Modelling Question Answering
— Unverified 0eaVQA: An Experimental Analysis on Visual Question Answering Models Dec 1, 2021 Question Answering Visual Question Answering
— Unverified 0Curriculum Learning Effectively Improves Low Data VQA Dec 1, 2021 Question Answering Visual Question Answering
— Unverified 0Debiased Visual Question Answering from Feature and Sample Perspectives Dec 1, 2021 Bias Detection Question Answering
Code Code Available 1Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning Dec 1, 2021 Logical Reasoning Question Answering
— Unverified 0Robust Visual Reasoning via Language Guided Neural Module Networks Dec 1, 2021 Question Answering Referring Expression
— Unverified 0OCR-free Document Understanding Transformer Nov 30, 2021 Document Image Classification document understanding
Code Code Available 3Searching the Search Space of Vision Transformer Nov 29, 2021 Neural Architecture Search object-detection
Code Code Available 1Classification-Regression for Chart Comprehension Nov 29, 2021 Chart Question Answering Classification
Code Code Available 1LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering Nov 29, 2021 Diversity Question Answering
— Unverified 0Scene Graph Generation with Geometric Context Nov 25, 2021 Activity Recognition Graph Generation
— Unverified 0UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture Nov 22, 2021 Handwritten Text Recognition object-detection
Code Code Available 1Florence: A New Foundation Model for Computer Vision Nov 22, 2021 Action Classification Action Recognition
Code Code Available 1A Confidence-Based Interface for Neuro-Symbolic Visual Question Answering Nov 21, 2021 Question Answering Translation
— Unverified 0Medical Visual Question Answering: A Survey Nov 19, 2021 Medical Visual Question Answering Question Answering
— Unverified 0UFO: A UniFied TransfOrmer for Vision-Language Representation Learning Nov 19, 2021 Image Captioning Image-text matching
— Unverified 0Blind VQA on 360° Video via Progressively Learning from Pixels, Frames and Video Nov 18, 2021 Visual Question Answering (VQA)
Code Code Available 0Achieving Human Parity on Visual Question Answering Nov 17, 2021 Question Answering Visual Question Answering
— Unverified 0Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation Nov 16, 2021 Image Captioning Knowledge Distillation
— Unverified 0Co-VQA : Answering by Interactive Sub Question Sequence Nov 16, 2021 Question Answering Visual Question Answering
— Unverified 0Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base Nov 16, 2021 Question Answering Semantic Similarity
— Unverified 0A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Nov 16, 2021 Language Modeling Language Modelling
— Unverified 0Question-Led Semantic Structure Enhanced Attentions for VQA Nov 16, 2021 Question Answering Visual Question Answering
— Unverified 0ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities Nov 16, 2021 Articles Face Recognition
Code Code Available 0Breaking Down Questions for Outside-Knowledge Visual Question Answering Nov 16, 2021 Graph Neural Network Question Answering
— Unverified 0Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1Language bias in Visual Question Answering: A Survey and Taxonomy Nov 16, 2021 Question Answering Visual Question Answering
— Unverified 0Document AI: Benchmarks, Models and Applications Nov 16, 2021 Deep Learning Document AI
— Unverified 0No-Reference Video Quality Assessment Based on Benford’s Law and Perceptual Features Nov 12, 2021 No-Reference Image Quality Assessment Video Quality Assessment
Code Code Available 0Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture Nov 11, 2021 Graph Attention Question Answering
— Unverified 0