Object Attribute Matters in Visual Question Answering Dec 20, 2023 Attribute Graph Neural Network
Code Code Available 0AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Apr 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Weakly Supervised Relative Spatial Reasoning for Visual Question Answering Sep 4, 2021 Question Answering Spatial Reasoning
Code Code Available 0EaSe: A Diagnostic Tool for VQA based on Answer Diversity Jun 1, 2021 Diagnostic Diversity
Code Code Available 0OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics Feb 21, 2022 BIG-bench Machine Learning Graph Generation
Code Code Available 0CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions Jan 3, 2019 Diagnostic Image Segmentation
Code Code Available 0MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 0SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks Nov 29, 2024 Question Answering Visual Question Answering
Code Code Available 0Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Dec 2, 2016 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0OmniFusion Technical Report Apr 9, 2024 MM-Vet TextVQA
Code Code Available 0M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 0OmniNet: A unified architecture for multi-modal multi-task learning Jul 17, 2019 Image Captioning Multi-Task Learning
Code Code Available 0Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery Oct 29, 2023 Deep Learning Multimodal Deep Learning
Code Code Available 0AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss May 5, 2021 Question Answering Visual Question Answering
Code Code Available 0CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images Jun 1, 2021 Question Answering Visual Question Answering
Code Code Available 0LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 0Dynamic Memory Networks for Visual and Textual Question Answering Mar 4, 2016 Question Answering Visual Question Answering
Code Code Available 0Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering Mar 6, 2022 Graph Attention Question Answering
Code Code Available 0DVQA: Understanding Data Visualizations via Question Answering Jan 24, 2018 Articles Chart Question Answering
Code Code Available 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0On Modality Bias in the TVQA Dataset Dec 18, 2020 Question Answering Video Question Answering
Code Code Available 0On Modality Bias Recognition and Reduction Feb 25, 2022 Action Recognition Multi-modal Classification
Code Code Available 0Variational Quantum Optimization with Continuous Bandits Feb 6, 2025 Visual Question Answering (VQA)
Code Code Available 0Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 0DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue Nov 17, 2019 feature selection Question Answering
Code Code Available 0CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images Apr 13, 2021 Question Answering Visual Question Answering
Code Code Available 0Dual Recurrent Attention Units for Visual Question Answering Feb 1, 2018 Question Answering Visual Question Answering
Code Code Available 0LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering May 29, 2021 Question Answering Visual Question Answering
Code Code Available 0Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Oct 30, 2020 Face Recognition image-classification
Code Code Available 0Dual Attention Networks for Visual Reference Resolution in Visual Dialog Feb 25, 2019 AI Agent Question Answering
Code Code Available 0Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering May 21, 2015 Question Answering Sentence
Code Code Available 0A Dataset and Architecture for Visual Reasoning with a Working Memory Mar 16, 2018 Diagnostic Logical Reasoning
Code Code Available 0CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning Nov 26, 2018 Acoustic Question Answering Question Answering
Code Code Available 0Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 0Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 0Locally Smoothed Neural Networks Nov 22, 2017 Face Verification Question Answering
Code Code Available 0LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 0Open-Ended Multi-Modal Relational Reasoning for Video Question Answering Dec 1, 2020 Question Answering Relational Reasoning
Code Code Available 0Open-Ended Visual Question-Answering Oct 9, 2016 Question Answering Sentence
Code Code Available 0Synthetic Document Question Answering in Hungarian May 29, 2025 Optical Character Recognition (OCR) Question Answering
Code Code Available 0LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Feb 26, 2024 Continual Learning Exemplar-Free
Code Code Available 0LLaVA-OneVision: Easy Visual Task Transfer Aug 6, 2024 3D Question Answering (3D-QA)
Code Code Available 0Open-Set Knowledge-Based Visual Question Answering with Inference Paths Oct 12, 2023 Knowledge Graphs Multi-class Classification
Code Code Available 0OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese May 7, 2023 Information Retrieval Question Answering
Code Code Available 0LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering Dec 24, 2024 Explanatory Visual Question Answering Multimodal Reasoning
Code Code Available 0Systematic Generalization: What Is Required and Can It Be Learned? Nov 30, 2018 Systematic Generalization Visual Question Answering (VQA)
Code Code Available 0Optimal training of variational quantum algorithms without barren plateaus Apr 29, 2021 Quantum Machine Learning Visual Question Answering (VQA)
Code Code Available 0CAST: Cross-modal Alignment Similarity Test for Vision Language Models Sep 17, 2024 cross-modal alignment Question Answering
Code Code Available 0T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Mar 14, 2025 Attribute Question Answering
Code Code Available 0Dual Attention Networks for Multimodal Reasoning and Matching Nov 2, 2016 Collaborative Inference Image-text matching
Code Code Available 0