Linearly Mapping from Image to Text Space Sep 30, 2022 Image Captioning Image to text
Code Code Available 1TVLT: Textless Vision-Language Transformer Sep 28, 2022 Automatic Speech Recognition (ASR) Image Retrieval
Code Code Available 1RepsNet: Combining Vision with Language for Automated Medical Reports Sep 27, 2022 Contrastive Learning Decoder
— Unverified 0Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline Sep 24, 2022 Question Answering Visual Question Answering
Code Code Available 1Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 0Continual VQA for Disaster Response Systems Sep 21, 2022 Disaster Response Management
Code Code Available 0Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering Sep 21, 2022 Image Captioning Optical Character Recognition (OCR)
— Unverified 0Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering Sep 20, 2022 Multimodal Deep Learning Multimodal Reasoning
Code Code Available 2Panoramic Vision Transformer for Saliency Detection in 360° Videos Sep 19, 2022 Saliency Detection Saliency Prediction
Code Code Available 1Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances Sep 18, 2022 Attribute Question Answering
Code Code Available 0LAVIS: A Library for Language-Vision Intelligence Sep 15, 2022 Benchmarking Image Captioning
— Unverified 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0PaLI: A Jointly-Scaled Multilingual Language-Image Model Sep 14, 2022 Decoder Few-Shot Image Classification
— Unverified 0Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering Sep 14, 2022 Adversarial Robustness Question Answering
— Unverified 0MUST-VQA: MUltilingual Scene-text VQA Sep 14, 2022 Question Answering Visual Question Answering
— Unverified 0PreSTU: Pre-Training for Scene-Text Understanding Sep 12, 2022 Decoder Image Captioning
— Unverified 0MaXM: Towards Multilingual Visual Question Answering Sep 12, 2022 Question Answering Translation
Code Code Available 1Pre-training image-language transformers for open-vocabulary tasks Sep 9, 2022 Question Answering Visual Entailment
— Unverified 0Improving the Cross-Lingual Generalisation in Visual Question Answering Sep 7, 2022 Cross-Lingual Transfer Question Answering
Code Code Available 0An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 12BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos Aug 31, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Evaluating Point Cloud from Moving Camera Videos: A No-Reference Metric Aug 30, 2022 Image Quality Assessment Point Cloud Quality Assessment
Code Code Available 0Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task Aug 24, 2022 Continual Learning Question Answering
Code Code Available 1Bidirectional Contrastive Split Learning for Visual Question Answering Aug 24, 2022 Adversarial Attack Backdoor Attack
— Unverified 0FashionVQA: A Domain-Specific Visual Question Answering System Aug 24, 2022 Question Answering Visual Question Answering
— Unverified 0How good are deep models in understanding the generated images? Aug 23, 2022 Object Object Recognition
— Unverified 0Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Aug 22, 2022 All Cross-Modal Retrieval
Code Code Available 0VLMAE: Vision-Language Masked Autoencoder Aug 19, 2022 Image-text Retrieval Language Modeling
— Unverified 0Understanding Attention for Vision-and-Language Tasks Aug 17, 2022 Image Generation Image Retrieval
Code Code Available 0ILLUME: Rationalizing Vision-Language Models through Human Interactions Aug 17, 2022 Image Captioning Question Answering
Code Code Available 0Aesthetic Visual Question Answering of Photographs Aug 10, 2022 Question Answering Sentiment Analysis
— Unverified 0CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning Aug 10, 2022 Math Mathematical Reasoning
Code Code Available 1ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding Aug 5, 2022 Image Retrieval Question Answering
Code Code Available 1Prompt Tuning for Generative Multimodal Pretrained Models Aug 4, 2022 Image Captioning Visual Entailment
— Unverified 0TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation Aug 3, 2022 Answer Generation Question-Answer-Generation
Code Code Available 1NAPA: Intermediate-level Variational Native-pulse Ansatz for Variational Quantum Algorithms Aug 2, 2022 Neural Architecture Search Visual Question Answering (VQA)
— Unverified 0Generative Bias for Robust Visual Question Answering Aug 1, 2022 Knowledge Distillation Question Answering
Code Code Available 1Video Question Answering with Iterative Video-Text Co-Tokenization Aug 1, 2022 Question Answering Video Question Answering
— Unverified 0Parameter-Parallel Distributed Variational Quantum Algorithm Jul 31, 2022 Visual Question Answering (VQA)
— Unverified 0Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base Jul 27, 2022 Question Answering Semantic Similarity
— Unverified 0LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection Jul 26, 2022 Decoder Knowledge Graphs
Code Code Available 1Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering Jul 26, 2022 Causal Inference Question Answering
Code Code Available 1WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models Jul 25, 2022 Common Sense Reasoning General Knowledge
Code Code Available 0Is GPT-3 all you need for Visual Question Answering in Cultural Heritage? Jul 25, 2022 All Question Answering
— Unverified 0Towards Complex Document Understanding By Discrete Reasoning Jul 25, 2022 document understanding Question Answering
— Unverified 0Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem Jul 24, 2022 Diagnostic Question Answering
— Unverified 0Semantic-aware Modular Capsule Routing for Visual Question Answering Jul 21, 2022 Question Answering Visual Question Answering
— Unverified 0Rethinking Data Augmentation for Robust Visual Question Answering Jul 18, 2022 Data Augmentation Knowledge Distillation
Code Code Available 1Clover: Towards A Unified Video-Language Alignment and Fusion Model Jul 16, 2022 Language Modeling Language Modelling
Code Code Available 1