Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model Jun 15, 2024 Question Answering Video Understanding
Code Code Available 0A Question-Centric Model for Visual Question Answering in Medical Imaging Mar 2, 2020 Medical Image Analysis Question Answering
Code Code Available 0Did the Model Understand the Question? May 14, 2018 model Question Answering
Code Code Available 0Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data Jul 24, 2020 Visual Dialog Visual Question Answering (VQA)
Code Code Available 0InstructOCR: Instruction Boosting Scene Text Spotting Dec 20, 2024 Optical Character Recognition (OCR) Text Spotting
Code Code Available 0VinVL+L: Enriching Visual Representation with Location Context in VQA Feb 22, 2023 Question Answering TAG
Code Code Available 0QACE: Asking Questions to Evaluate an Image Caption Aug 28, 2021 Question Answering Visual Question Answering (VQA)
Code Code Available 0Inferring and Executing Programs for Visual Reasoning May 10, 2017 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 0Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs Oct 26, 2023 Attribute Machine Translation
Code Code Available 0QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models Apr 15, 2025 Question Answering Visual Question Answering
Code Code Available 0Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models May 26, 2025 image-classification Image Classification
Code Code Available 0Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts Nov 15, 2023 Question Answering Sentence
Code Code Available 0Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering Aug 10, 2017 Question Answering Visual Question Answering
Code Code Available 0QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning May 6, 2022 Diagnostic Question Answering
Code Code Available 0ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities Nov 16, 2021 Articles Face Recognition
Code Code Available 0Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge Aug 9, 2017 GPU Visual Question Answering
Code Code Available 0Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking Oct 11, 2021 Benchmarking Question Answering
Code Code Available 0Quantifying and Alleviating the Language Prior Problem in Visual Question Answering May 13, 2019 Information Retrieval Question Answering
Code Code Available 0Improving the Cross-Lingual Generalisation in Visual Question Answering Sep 7, 2022 Cross-Lingual Transfer Question Answering
Code Code Available 0Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts Nov 18, 2024 Benchmarking Multimodal Large Language Model
Code Code Available 0Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Jan 1, 2022 Question Answering Visual Question Answering
Code Code Available 0VizWiz Grand Challenge: Answering Visual Questions from Blind People Feb 22, 2018 Question Answering Visual Question Answering
Code Code Available 0Improved RAMEN: Towards Domain Generalization for Visual Question Answering Sep 6, 2021 Domain Generalization Question Answering
Code Code Available 0Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training Mar 30, 2024 Contrastive Learning Question Answering
Code Code Available 0VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Jun 22, 2022 Feature Importance Question Answering
Code Code Available 0Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 0Adaptive Score Alignment Learning for Continual Perceptual Quality Assessment of 360-Degree Videos in Virtual Reality Feb 27, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge Jan 1, 2023 Decision Making Question Answering
Code Code Available 0Applying recent advances in Visual Question Answering to Record Linkage Jul 12, 2020 Question Answering Visual Question Answering
Code Code Available 0Delving Deeper into Cross-lingual Visual Question Answering Feb 15, 2022 Inductive Bias Question Answering
Code Code Available 0Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving Jul 18, 2023 Autonomous Driving Model Selection
Code Code Available 0X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering Jul 24, 2021 Attribute Out-of-Distribution Generalization
Code Code Available 0Towards a Unified Multimodal Reasoning Framework Dec 22, 2023 Multimodal Reasoning Multiple-choice
Code Code Available 0QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View Jul 18, 2024 Action Anticipation Action Recognition
Code Code Available 0Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering Apr 3, 2018 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Sep 21, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 0Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Nov 18, 2015 Image Retrieval with Multi-Modal Query Parameter Prediction
Code Code Available 0Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering Sep 15, 2021 Image Captioning Knowledge Graphs
Code Code Available 0BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis Aug 10, 2021 Language Modeling Language Modelling
Code Code Available 0RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Apr 25, 2024 Segmentation Sentence
Code Code Available 0Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Aug 22, 2022 All Cross-Modal Retrieval
Code Code Available 0Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions Dec 11, 2024 Benchmarking Question Answering
Code Code Available 0Answer Them All! Toward Universal Visual Question Answering Models Mar 1, 2019 All Question Answering
Code Code Available 0Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA Jun 18, 2024 Question Answering Visual Question Answering
Code Code Available 0Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model Jul 31, 2024 Benchmarking Large Language Model
Code Code Available 0Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks Jul 30, 2024 Visual Question Answering (VQA)
Code Code Available 0Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering Mar 24, 2022 GPU Question Answering
Code Code Available 0ILLUME: Rationalizing Vision-Language Models through Human Interactions Aug 17, 2022 Image Captioning Question Answering
Code Code Available 0Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models Sep 23, 2024 Decision Making Question Answering
Code Code Available 0Barlow constrained optimization for Visual Question Answering Mar 7, 2022 Question Answering Visual Question Answering
Code Code Available 0