PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery Feb 19, 2025 Question Answering Visual Question Answering
Code Code Available 0Knowledge Guided Semi-Supervised Learning for Quality Assessment of User Generated Videos Dec 24, 2023 Representation Learning Transfer Learning
Code Code Available 0Knowledge Generation for Zero-shot Knowledge-based VQA Feb 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering Sep 13, 2021 Data Augmentation Question Answering
Code Code Available 0What Can Neural Networks Reason About? May 30, 2019 Question Answering Visual Question Answering
Code Code Available 0Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training Oct 17, 2022 Image Captioning Network Interpretation
Code Code Available 0Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning Jul 6, 2022 Diagnostic Multi-Task Learning
Code Code Available 0'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks Mar 28, 2021 Question Answering Visual Question Answering
Code Code Available 0JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Sep 19, 2024 Hallucination Image Captioning
Code Code Available 0Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model Jul 29, 2018 Visual Question Answering (VQA)
Code Code Available 0Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models Apr 6, 2024 MME Object
Code Code Available 0Pragmatic Issue-Sensitive Image Captioning Apr 29, 2020 Descriptive Image Captioning
Code Code Available 0Temporal Reasoning via Audio Question Answering Nov 21, 2019 Audio Question Answering Diagnostic
Code Code Available 0Joint Answering and Explanation for Visual Commonsense Reasoning Feb 25, 2022 Knowledge Distillation Question Answering
Code Code Available 0Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision May 6, 2025 Learning-To-Rank Self-Supervised Learning
Code Code Available 0Is Multimodal Vision Supervision Beneficial to Language? Feb 10, 2023 Image Retrieval Natural Language Understanding
Code Code Available 0Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering Apr 22, 2024 Language Modeling Language Modelling
Code Code Available 0BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection Jan 31, 2019 Question Answering Relationship Detection
Code Code Available 0Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering Jun 28, 2023 Passage Retrieval Question Answering
Code Code Available 0Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays Feb 14, 2024 Language Modeling Language Modelling
Code Code Available 0Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following Jun 4, 2024 Question Answering Visual Question Answering
Code Code Available 0IQ-VQA: Intelligent Visual Question Answering Jul 8, 2020 Question Answering Visual Question Answering
Code Code Available 0Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Oct 15, 2024 Image Description Multiple-choice
Code Code Available 0Blind VQA on 360° Video via Progressively Learning from Pixels, Frames and Video Nov 18, 2021 Visual Question Answering (VQA)
Code Code Available 0Blind Prediction of Natural Video Quality Jan 9, 2014 Prediction Video Quality Assessment
Code Code Available 0Differential Attention for Visual Question Answering Apr 1, 2018 Question Answering Visual Question Answering
Code Code Available 0IQA: Visual Question Answering in Interactive Environments Dec 9, 2017 Navigate Reinforcement Learning
Code Code Available 0A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering Oct 1, 2022 Medical Visual Question Answering Question Answering
Code Code Available 0Visual Robustness Benchmark for Visual Question Answering (VQA) Jul 3, 2024 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 0TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering Apr 14, 2017 Question Answering Visual Question Answering
Code Code Available 0iParaphrasing: Extracting Visually Grounded Paraphrases via an Image Jun 12, 2018 Image Captioning Question Answering
Code Code Available 0What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning May 5, 2022 Multi-Task Learning Question Answering
Code Code Available 0Biomedical Visual Instruction Tuning with Clinician Preference Alignment Jun 19, 2024 Instruction Following Visual Question Answering (VQA)
Code Code Available 0Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering Mar 26, 2024 Decision Making Explainable artificial intelligence
Code Code Available 0Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models Mar 22, 2025 Question Answering Visual Question Answering
Code Code Available 0What is the Visual Cognition Gap between Humans and Multimodal LLMs? Jun 14, 2024 object-detection Object Detection
Code Code Available 0ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images Apr 16, 2024 Multimodal Deep Learning Optical Character Recognition (OCR)
Code Code Available 0BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Mar 4, 2025 Medical Diagnosis Question Answering
Code Code Available 0Interpretable Visual Reasoning via Induced Symbolic Space Nov 23, 2020 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 0Viewport Proposal CNN for 360deg Video Quality Assessment Jun 1, 2019 Saliency Prediction Video Quality Assessment
Code Code Available 0InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Sep 26, 2023 Articles Image Comprehension
Code Code Available 0VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models May 28, 2025 Decision Making Question Answering
Code Code Available 0BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models Jan 28, 2023 Out-of-Distribution Generalization Question Answering
Code Code Available 0The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision Apr 26, 2019 Image-text Retrieval Object
Code Code Available 0Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering Mar 22, 2023 Question Answering Visual Question Answering
Code Code Available 0The Promise of Premise: Harnessing Question Premises in Visual Question Answering May 1, 2017 Question Answering Relevance Detection
Code Code Available 0Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis Jan 31, 2024 Multi-Task Learning Question Answering
Code Code Available 0Are Red Roses Red? Evaluating Consistency of Question-Answering Models Jul 1, 2019 Question Answering valid
Code Code Available 0Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Feb 11, 2023 Image-text Retrieval Knowledge Graphs
Code Code Available 0