HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images Jan 23, 2023 Attribute Question Answering
— Unverified 0Champion Solution for the WSDM2023 Toloka VQA Challenge Jan 22, 2023 Question Answering Visual Grounding
Code Code Available 3Towards Models that Can See and Read Jan 18, 2023 Decoder Image Captioning
— Unverified 0Curriculum Script Distillation for Multilingual Visual Question Answering Jan 17, 2023 Question Answering Visual Question Answering
— Unverified 0Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 0SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images Jan 12, 2023 Evidence Selection Question Answering
Code Code Available 1Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering Jan 11, 2023 Question Answering Reading Comprehension
Code Code Available 1Adaptively Clustering Neighbor Elements for Image-Text Generation Jan 5, 2023 Clustering Decoder
Code Code Available 0PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Jan 1, 2023 Image Captioning Question Answering
— Unverified 0Variational Causal Inference Network for Explanatory Visual Question Answering Jan 1, 2023 Explanation Generation Explanatory Visual Question Answering
Code Code Available 1Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge Jan 1, 2023 Decision Making Question Answering
Code Code Available 0Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering Jan 1, 2023 Continual Learning Language Modelling
— Unverified 0RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 0From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models Jan 1, 2023 Question Answering Visual Question Answering
— Unverified 0VQACL: A Novel Visual Question Answering Continual Learning Setting Jan 1, 2023 Continual Learning Question Answering
Code Code Available 1Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges Dec 26, 2022 Representation Learning Visual Question Answering (VQA)
— Unverified 0When are Lemons Purple? The Concept Association Bias of Vision-Language Models Dec 22, 2022 Attribute image-classification
— Unverified 0From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models Dec 21, 2022 Question Answering Visual Question Answering
Code Code Available 0UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering Dec 21, 2022 Data Augmentation Decision Making
— Unverified 0DePlot: One-shot visual language reasoning by plot-to-table translation Dec 20, 2022 Chart Question Answering Factual Inconsistency Detection in Chart Captioning
— Unverified 0Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? Dec 20, 2022 Question Answering Representation Learning
— Unverified 0MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering Dec 19, 2022 Form Question Answering
Code Code Available 1MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering Dec 19, 2022 Chart Question Answering Data Summarization
— Unverified 0SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering Dec 16, 2022 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0CLIPPO: Image-and-Language Understanding from Pixels Only Dec 15, 2022 Contrastive Learning image-classification
— Unverified 0REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory Dec 10, 2022 Image Captioning Language Modeling
Code Code Available 0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners Dec 9, 2022 Question Answering Retrieval
— Unverified 0Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1ParsVQA-Caps: A Benchmark for Visual Question Answering and Image Captioning in Persian Dec 7, 2022 Image Captioning Question Answering
— Unverified 0Hierarchical multimodal transformers for Multi-Page DocVQA Dec 7, 2022 Decoder Question Answering
Code Code Available 1Review of Ansatz Designing Techniques for Variational Quantum Algorithms Dec 7, 2022 Visual Question Answering (VQA)
— Unverified 0InternVideo: General Video Foundation Models via Generative and Discriminative Learning Dec 6, 2022 Action Classification Action Recognition
Code Code Available 4Unifying Vision, Text, and Layout for Universal Document Processing Dec 5, 2022 Document AI document understanding
Code Code Available 3Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests Dec 3, 2022 Question Answering Visual Question Answering
Code Code Available 0Compound Tokens: Channel Fusion for Vision-Language Representation Learning Dec 2, 2022 Decoder Language Modeling
— Unverified 0Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning Dec 1, 2022 Domain Generalization Question Answering
Code Code Available 1Semi-supervised Learning of Perceptual Video Quality by Generating Consistent Pairwise Pseudo-Ranks Nov 30, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Optimizing Explanations by Network Canonization and Hyperparameter Search Nov 30, 2022 Explainable Artificial Intelligence (XAI) image-classification
— Unverified 0PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals Nov 29, 2022 Deep Learning Question Answering
— Unverified 0Neuro-Symbolic Spatio-Temporal Reasoning Nov 28, 2022 AI Agent Image Segmentation
— Unverified 0Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning Nov 24, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Self-supervised vision-language pretraining for Medical visual question answering Nov 24, 2022 Contrastive Learning Image-text matching
Code Code Available 1Look, Read and Ask: Learning to Ask Questions by Reading Text in Images Nov 23, 2022 Optical Character Recognition (OCR) Question Answering
— Unverified 0A Short Survey of Systematic Generalization Nov 22, 2022 Survey Systematic Generalization
— Unverified 0X^2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks Nov 22, 2022 All Cross-Modal Retrieval
Code Code Available 2Cross-Modal Contrastive Learning for Robust Reasoning in VQA Nov 21, 2022 Contrastive Learning Question Answering
Code Code Available 0Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations Nov 21, 2022 Contrastive Learning Representation Learning
Code Code Available 1Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference Nov 21, 2022 Natural Language Inference Question Answering
— Unverified 0