Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering Sep 14, 2022 Adversarial Robustness Question Answering
— Unverified 0PreSTU: Pre-Training for Scene-Text Understanding Sep 12, 2022 Decoder Image Captioning
— Unverified 0Pre-training image-language transformers for open-vocabulary tasks Sep 9, 2022 Question Answering Visual Entailment
— Unverified 0Improving the Cross-Lingual Generalisation in Visual Question Answering Sep 7, 2022 Cross-Lingual Transfer Question Answering
Code Code Available 0Evaluating Point Cloud from Moving Camera Videos: A No-Reference Metric Aug 30, 2022 Image Quality Assessment Point Cloud Quality Assessment
Code Code Available 0Bidirectional Contrastive Split Learning for Visual Question Answering Aug 24, 2022 Adversarial Attack Backdoor Attack
— Unverified 0FashionVQA: A Domain-Specific Visual Question Answering System Aug 24, 2022 Question Answering Visual Question Answering
— Unverified 0How good are deep models in understanding the generated images? Aug 23, 2022 Object Object Recognition
— Unverified 0Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Aug 22, 2022 All Cross-Modal Retrieval
Code Code Available 0VLMAE: Vision-Language Masked Autoencoder Aug 19, 2022 Image-text Retrieval Language Modeling
— Unverified 0Understanding Attention for Vision-and-Language Tasks Aug 17, 2022 Image Generation Image Retrieval
Code Code Available 0ILLUME: Rationalizing Vision-Language Models through Human Interactions Aug 17, 2022 Image Captioning Question Answering
Code Code Available 0Aesthetic Visual Question Answering of Photographs Aug 10, 2022 Question Answering Sentiment Analysis
— Unverified 0Prompt Tuning for Generative Multimodal Pretrained Models Aug 4, 2022 Image Captioning Visual Entailment
— Unverified 0NAPA: Intermediate-level Variational Native-pulse Ansatz for Variational Quantum Algorithms Aug 2, 2022 Neural Architecture Search Visual Question Answering (VQA)
— Unverified 0Video Question Answering with Iterative Video-Text Co-Tokenization Aug 1, 2022 Question Answering Video Question Answering
— Unverified 0Parameter-Parallel Distributed Variational Quantum Algorithm Jul 31, 2022 Visual Question Answering (VQA)
— Unverified 0Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base Jul 27, 2022 Question Answering Semantic Similarity
— Unverified 0Towards Complex Document Understanding By Discrete Reasoning Jul 25, 2022 document understanding Question Answering
— Unverified 0Is GPT-3 all you need for Visual Question Answering in Cultural Heritage? Jul 25, 2022 All Question Answering
— Unverified 0WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models Jul 25, 2022 Common Sense Reasoning General Knowledge
Code Code Available 0Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem Jul 24, 2022 Diagnostic Question Answering
— Unverified 0Semantic-aware Modular Capsule Routing for Visual Question Answering Jul 21, 2022 Question Answering Visual Question Answering
— Unverified 0QSAN: A Near-term Achievable Quantum Self-Attention Network Jul 14, 2022 Binary Classification image-classification
— Unverified 0Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content Jul 13, 2022 Contrastive Learning Optical Flow Estimation
Code Code Available 0Subjective and Objective Quality Assessment of High-Motion Sports Videos at Low-Bitrates Jul 12, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment Jul 8, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0OVQA: A Clinically Generated Visual Question Answering Dataset Jul 7, 2022 Benchmarking Medical Visual Question Answering
— Unverified 0Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning Jul 6, 2022 Diagnostic Multi-Task Learning
Code Code Available 0VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems Jul 1, 2022 Information Retrieval Question Answering
— Unverified 0American == White in Multimodal Language-and-Image AI Jul 1, 2022 Image Captioning Question Answering
— Unverified 0Modern Question Answering Datasets and Benchmarks: A Survey Jun 30, 2022 Deep Learning Question Answering
— Unverified 0EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering Jun 29, 2022 Contrastive Learning Out of Distribution (OOD) Detection
— Unverified 0From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering Jun 25, 2022 Question Answering Visual Question Answering
— Unverified 0VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Jun 22, 2022 Feature Importance Question Answering
Code Code Available 0Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding Jun 21, 2022 Decoder Question Answering
— Unverified 0DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment Jun 20, 2022 Time Series Analysis Video Quality Assessment
Code Code Available 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks Jun 17, 2022 Depth Estimation Image Generation
— Unverified 0Test-Time Adaptation for Visual Document Understanding Jun 15, 2022 document understanding Domain Adaptation
— Unverified 0Language Models are General-Purpose Interfaces Jun 13, 2022 Causal Language Modeling Few-Shot Learning
— Unverified 0Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model Jun 10, 2022 Question Answering Task 2
— Unverified 0cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation Jun 7, 2022 Knowledge Distillation Question Answering
Code Code Available 0From Pixels to Objects: Cubic Visual Attention for Visual Question Answering Jun 4, 2022 Object Question Answering
— Unverified 0Structured Two-stream Attention Network for Video Question Answering Jun 2, 2022 Question Answering Video Question Answering
— Unverified 0VL-BEiT: Generative Vision-Language Pretraining Jun 2, 2022 image-classification Image Classification
— Unverified 0Un jeu de données pour répondre à des questions visuelles à propos d’entités nommées en utilisant des bases de connaissances (ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities) Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 0Question Modifiers in Visual Question Answering Jun 1, 2022 Natural Language Understanding Question Answering
— Unverified 0Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? Jun 1, 2022 Question Answering Visual Question Answering
— Unverified 0An Efficient Modern Baseline for FloodNet VQA May 30, 2022 Management Visual Question Answering (VQA)
Code Code Available 0