Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests Dec 3, 2022 Question Answering Visual Question Answering
Code Code Available 0Compound Tokens: Channel Fusion for Vision-Language Representation Learning Dec 2, 2022 Decoder Language Modeling
— Unverified 0Semi-supervised Learning of Perceptual Video Quality by Generating Consistent Pairwise Pseudo-Ranks Nov 30, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Optimizing Explanations by Network Canonization and Hyperparameter Search Nov 30, 2022 Explainable Artificial Intelligence (XAI) image-classification
— Unverified 0PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals Nov 29, 2022 Deep Learning Question Answering
— Unverified 0Neuro-Symbolic Spatio-Temporal Reasoning Nov 28, 2022 AI Agent Image Segmentation
— Unverified 0Look, Read and Ask: Learning to Ask Questions by Reading Text in Images Nov 23, 2022 Optical Character Recognition (OCR) Question Answering
— Unverified 0A Short Survey of Systematic Generalization Nov 22, 2022 Survey Systematic Generalization
— Unverified 0Cross-Modal Contrastive Learning for Robust Reasoning in VQA Nov 21, 2022 Contrastive Learning Question Answering
Code Code Available 0Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference Nov 21, 2022 Natural Language Inference Question Answering
— Unverified 0A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering Nov 19, 2022 Continual Learning Question Answering
— Unverified 0Text-Aware Dual Routing Network for Visual Question Answering Nov 17, 2022 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0AlignVE: Visual Entailment Recognition Based on Alignment Relations Nov 16, 2022 Question Answering Relation
— Unverified 0Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 0Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA Nov 14, 2022 Question Generation Question-Generation
Code Code Available 0Learning to Answer Multilingual and Code-Mixed Questions Nov 14, 2022 AI Agent Question Answering
— Unverified 0MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering Nov 11, 2022 Medical Visual Question Answering Question Answering
— Unverified 0Watching the News: Towards VideoQA Models that can Read Nov 10, 2022 Question Answering Video Question Answering
— Unverified 0Towards Reasoning-Aware Explainable VQA Nov 9, 2022 Decoder Explanation Generation
— Unverified 0ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation Nov 9, 2022 Contrastive Learning Decoder
— Unverified 0CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering Nov 7, 2022 Add - PO Add - PQ
Code Code Available 0Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering Oct 26, 2022 Question Answering Visual Question Answering
Code Code Available 0What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? Oct 26, 2022 Benchmarking Question Answering
Code Code Available 0Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems Oct 26, 2022 Question Answering Visual Question Answering
— Unverified 0Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision Oct 24, 2022 cross-modal alignment Cross-Modal Retrieval
— Unverified 0RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0Image Semantic Relation Generation Oct 19, 2022 Image Retrieval Image Segmentation
— Unverified 0Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering Oct 18, 2022 Passage Retrieval Question Answering
— Unverified 0Aligning MAGMA by Few-Shot Learning and Finetuning Oct 18, 2022 Few-Shot Learning Image Captioning
— Unverified 0Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training Oct 17, 2022 Image Captioning Network Interpretation
Code Code Available 0DCVQE: A Hierarchical Transformer for Video Quality Assessment Oct 10, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing Oct 10, 2022 Question Answering Representation Learning
— Unverified 0HVS Revisited: A Comprehensive Video Quality Assessment Framework Oct 9, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 0Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Oct 4, 2022 Image Captioning Sentence
Code Code Available 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0On the Effects of Video Grounding on Language Models Oct 1, 2022 Image Captioning Question Answering
— Unverified 0A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering Oct 1, 2022 Medical Visual Question Answering Question Answering
Code Code Available 0Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering Oct 1, 2022 Question Answering Visual Question Answering
— Unverified 0Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering Sep 30, 2022 Continual Learning Question Answering
Code Code Available 0RepsNet: Combining Vision with Language for Automated Medical Reports Sep 27, 2022 Contrastive Learning Decoder
— Unverified 0Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 0Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering Sep 21, 2022 Image Captioning Optical Character Recognition (OCR)
— Unverified 0Continual VQA for Disaster Response Systems Sep 21, 2022 Disaster Response Management
Code Code Available 0Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances Sep 18, 2022 Attribute Question Answering
Code Code Available 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0LAVIS: A Library for Language-Vision Intelligence Sep 15, 2022 Benchmarking Image Captioning
— Unverified 0MUST-VQA: MUltilingual Scene-text VQA Sep 14, 2022 Question Answering Visual Question Answering
— Unverified 0PaLI: A Jointly-Scaled Multilingual Language-Image Model Sep 14, 2022 Decoder Few-Shot Image Classification
— Unverified 0