QSAN: A Near-term Achievable Quantum Self-Attention Network Jul 14, 2022 Binary Classification image-classification
— Unverified 0Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content Jul 13, 2022 Contrastive Learning Optical Flow Estimation
Code Code Available 0Subjective and Objective Quality Assessment of High-Motion Sports Videos at Low-Bitrates Jul 12, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Video Graph Transformer for Video Question Answering Jul 12, 2022 Question Answering Relation
Code Code Available 1ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities Jul 11, 2022 Articles Few-Shot Learning
Code Code Available 1Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment Jul 8, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0OVQA: A Clinically Generated Visual Question Answering Dataset Jul 7, 2022 Benchmarking Medical Visual Question Answering
— Unverified 0Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning Jul 6, 2022 Diagnostic Multi-Task Learning
Code Code Available 0Weakly Supervised Grounding for VQA in Vision-Language Transformers Jul 5, 2022 Question Answering Representation Learning
Code Code Available 1VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems Jul 1, 2022 Information Retrieval Question Answering
— Unverified 0American == White in Multimodal Language-and-Image AI Jul 1, 2022 Image Captioning Question Answering
— Unverified 0Modern Question Answering Datasets and Benchmarks: A Survey Jun 30, 2022 Deep Learning Question Answering
— Unverified 0A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA Jun 30, 2022 Question Answering Retrieval
Code Code Available 1EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering Jun 29, 2022 Contrastive Learning Out of Distribution (OOD) Detection
— Unverified 0Consistency-preserving Visual Question Answering in Medical Imaging Jun 27, 2022 Question Answering Visual Question Answering
Code Code Available 1From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering Jun 25, 2022 Question Answering Visual Question Answering
— Unverified 0Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer Jun 22, 2022 Question Answering Sentence
Code Code Available 1VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Jun 22, 2022 Feature Importance Question Answering
Code Code Available 0Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding Jun 21, 2022 Decoder Question Answering
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment Jun 20, 2022 Time Series Analysis Video Quality Assessment
Code Code Available 0Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks Jun 17, 2022 Depth Estimation Image Generation
— Unverified 0Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Jun 16, 2022 Fill Mask Language Modeling
Code Code Available 1MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1Test-Time Adaptation for Visual Document Understanding Jun 15, 2022 document understanding Domain Adaptation
— Unverified 0Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning Jun 13, 2022 Transfer Learning Visual Question Answering (VQA)
Code Code Available 2Language Models are General-Purpose Interfaces Jun 13, 2022 Causal Language Modeling Few-Shot Learning
— Unverified 0GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model Jun 10, 2022 Question Answering Task 2
— Unverified 0cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation Jun 7, 2022 Knowledge Distillation Question Answering
Code Code Available 0From Pixels to Objects: Cubic Visual Attention for Visual Question Answering Jun 4, 2022 Object Question Answering
— Unverified 0A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge Jun 3, 2022 Question Answering Visual Question Answering
Code Code Available 1Structured Two-stream Attention Network for Video Question Answering Jun 2, 2022 Question Answering Video Question Answering
— Unverified 0VL-BEiT: Generative Vision-Language Pretraining Jun 2, 2022 image-classification Image Classification
— Unverified 0REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering Jun 2, 2022 Question Answering Retrieval
Code Code Available 1Question Modifiers in Visual Question Answering Jun 1, 2022 Natural Language Understanding Question Answering
— Unverified 0Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 0Un jeu de données pour répondre à des questions visuelles à propos d’entités nommées en utilisant des bases de connaissances (ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities) Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 0An Efficient Modern Baseline for FloodNet VQA May 30, 2022 Management Visual Question Answering (VQA)
Code Code Available 0Visual Superordinate Abstraction for Robust Concept Learning May 28, 2022 Attribute Question Answering
— Unverified 0GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2V-Doc : Visual questions answers with Documents May 27, 2022 Question Answering Question Generation
— Unverified 0Avoiding Barren Plateaus with Classical Deep Neural Networks May 26, 2022 Visual Question Answering (VQA)
— Unverified 0Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 0mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization May 24, 2022 Image Captioning Out-of-Distribution Generalization
— Unverified 0On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization May 24, 2022 Descriptive Image Captioning
— Unverified 0VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering May 23, 2022 Knowledge Graphs Question Answering
— Unverified 0PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models May 23, 2022 Language Modeling Language Modelling
Code Code Available 1