A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering Nov 19, 2022 Continual Learning Question Answering
— Unverified 0Visual Programming: Compositional visual reasoning without training Nov 18, 2022 In-Context Learning Question Answering
Code Code Available 2Text-Aware Dual Routing Network for Visual Question Answering Nov 17, 2022 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 1AlignVE: Visual Entailment Recognition Based on Alignment Relations Nov 16, 2022 Question Answering Relation
— Unverified 0PromptCap: Prompt-Guided Task-Aware Image Captioning Nov 15, 2022 Image Captioning Language Modelling
Code Code Available 1MapQA: A Dataset for Question Answering on Choropleth Maps Nov 15, 2022 Articles Question Answering
Code Code Available 1Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 0Learning to Answer Multilingual and Code-Mixed Questions Nov 14, 2022 AI Agent Question Answering
— Unverified 0Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA Nov 14, 2022 Question Generation Question-Generation
Code Code Available 0MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering Nov 11, 2022 Medical Visual Question Answering Question Answering
— Unverified 0Watching the News: Towards VideoQA Models that can Read Nov 10, 2022 Question Answering Video Question Answering
— Unverified 0Towards Reasoning-Aware Explainable VQA Nov 9, 2022 Decoder Explanation Generation
— Unverified 0ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation Nov 9, 2022 Contrastive Learning Decoder
— Unverified 0Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives Nov 9, 2022 Disentanglement Video Generation
Code Code Available 2Visual Named Entity Linking: A New Dataset and A Baseline Nov 9, 2022 Entity Linking Image Retrieval
Code Code Available 1CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering Nov 7, 2022 Add - PO Add - PQ
Code Code Available 0Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems Oct 26, 2022 Question Answering Visual Question Answering
— Unverified 0What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? Oct 26, 2022 Benchmarking Question Answering
Code Code Available 0Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering Oct 26, 2022 Question Answering Visual Question Answering
Code Code Available 0VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge Oct 24, 2022 Question Answering Visual Question Answering
Code Code Available 1Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision Oct 24, 2022 cross-modal alignment Cross-Modal Retrieval
— Unverified 0RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Image Semantic Relation Generation Oct 19, 2022 Image Retrieval Image Segmentation
— Unverified 0Aligning MAGMA by Few-Shot Learning and Finetuning Oct 18, 2022 Few-Shot Learning Image Captioning
— Unverified 0Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering Oct 18, 2022 Passage Retrieval Question Answering
— Unverified 0Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training Oct 17, 2022 Image Captioning Network Interpretation
Code Code Available 0Meta-Learning via Classifier(-free) Diffusion Guidance Oct 17, 2022 Few-Shot Learning Image Generation
Code Code Available 1Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Oct 13, 2022 Image Captioning Question Answering
Code Code Available 1ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding Oct 12, 2022 document-image-classification Document Image Classification
Code Code Available 1SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models Oct 12, 2022 Object Question Answering
Code Code Available 1Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment Oct 11, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 2MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA Oct 10, 2022 Question Answering Visual Question Answering
Code Code Available 1DCVQE: A Hierarchical Transformer for Video Quality Assessment Oct 10, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning Oct 10, 2022 Contrastive Learning Question Answering
Code Code Available 1Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing Oct 10, 2022 Question Answering Representation Learning
— Unverified 0MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 0HVS Revisited: A Comprehensive Video Quality Assessment Framework Oct 9, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Retrieval Augmented Visual Question Answering with Outside Knowledge Oct 7, 2022 Answer Generation Diagnostic
Code Code Available 2Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Oct 7, 2022 Chart Question Answering Diversity
Code Code Available 2Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Oct 4, 2022 Image Captioning Sentence
Code Code Available 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0On the Effects of Video Grounding on Language Models Oct 1, 2022 Image Captioning Question Answering
— Unverified 0Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering Oct 1, 2022 Question Answering Visual Question Answering
— Unverified 0A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering Oct 1, 2022 Medical Visual Question Answering Question Answering
Code Code Available 0Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering Sep 30, 2022 Continual Learning Question Answering
Code Code Available 0