Focal Visual-Text Attention for Visual Question Answering Jun 5, 2018 Memex Question Answering Question Answering
Code Code Available 0Focal Visual-Text Attention for Memex Question Answering Dec 14, 2018 Memex Question Answering Question Answering
Code Code Available 0Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering Jul 28, 2023 Question Answering Visual Question Answering
Code Code Available 0A Diagram Is Worth A Dozen Images Mar 24, 2016 Visual Question Answering (VQA)
Code Code Available 0A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models Aug 2, 2017 Question Answering Visual Question Answering
Code Code Available 0UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models Oct 17, 2023 Attribute Question Answering
Code Code Available 0Contextual Dropout: An Efficient Sample-Dependent Dropout Module Mar 6, 2021 image-classification Image Classification
Code Code Available 0A Simple Baseline for Knowledge-Based Visual Question Answering Oct 20, 2023 In-Context Learning Question Answering
Code Code Available 0Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering Mar 9, 2021 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Self-Critical Reasoning for Robust Visual Question Answering May 24, 2019 Question Answering Visual Question Answering
Code Code Available 0Adaptively Clustering Neighbor Elements for Image-Text Generation Jan 5, 2023 Clustering Decoder
Code Code Available 0Zero-shot Translation of Attention Patterns in VQA Models to Natural Language Nov 8, 2023 Image Captioning Language Modeling
Code Code Available 0Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions Nov 20, 2023 Question Answering Visual Question Answering
Code Code Available 0Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 0Self Supervision for Attention Networks Jan 6, 2021 image-classification Image Classification
Code Code Available 0ArtQuest: Countering Hidden Language Biases in ArtVQA Jan 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Analyzing Modular Approaches for Visual Question Decomposition Nov 10, 2023 Code Generation Visual Question Answering (VQA)
Code Code Available 0Semantically Distributed Robust Optimization for Vision-and-Language Inference Oct 14, 2021 Data Augmentation Natural Language Inference
Code Code Available 0Semantically Equivalent Adversarial Rules for Debugging NLP models Jul 1, 2018 Data Augmentation Question Answering
Code Code Available 0Adaptive loose optimization for robust question answering May 6, 2023 Extractive Question-Answering Machine Reading Comprehension
Code Code Available 0FigureQA: An Annotated Figure Dataset for Visual Reasoning Oct 19, 2017 BIG-bench Machine Learning Chart Question Answering
Code Code Available 0SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation Oct 19, 2024 Diagnostic GPU
Code Code Available 0Understanding Attention for Vision-and-Language Tasks Aug 17, 2022 Image Generation Image Retrieval
Code Code Available 0Understanding Guided Image Captioning Performance across Domains Dec 4, 2020 Descriptive Image Captioning
Code Code Available 0Separate and Locate: Rethink the Text in Text-based Visual Question Answering Aug 31, 2023 Optical Character Recognition (OCR) Position
Code Code Available 0Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 0An Entropy Clustering Approach for Assessing Visual Question Difficulty Apr 12, 2020 Clustering Question Answering
Code Code Available 0Adapting Lightweight Vision Language Models for Radiological Visual Question Answering Jun 17, 2025 Diagnostic Question Answering
Code Code Available 0ShapeWorld - A new test methodology for multimodal language understanding Apr 14, 2017 Multimodal Deep Learning Visual Question Answering
Code Code Available 0Visual Question Answering: A Survey of Methods and Datasets Jul 20, 2016 General Knowledge Survey
Code Code Available 0Federated Document Visual Question Answering: A Pilot Study May 10, 2024 Federated Learning Question Answering
Code Code Available 0Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering Apr 11, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Nov 17, 2024 Hallucination In-Context Learning
Code Code Available 0Siamese Tracking with Lingual Object Constraints Nov 23, 2020 Object Object Tracking
Code Code Available 0Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA Nov 14, 2022 Question Generation Question-Generation
Code Code Available 0Simple Baseline for Visual Question Answering Dec 7, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering Oct 26, 2022 Question Answering Visual Question Answering
Code Code Available 0ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering Jun 6, 2019 Question Answering Video Question Answering
Code Code Available 0ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Oct 17, 2024 Visual Question Answering (VQA)
Code Code Available 0Factor Graph Attention Apr 11, 2019 Graph Attention Question Answering
Code Code Available 012-in-1: Multi-Task Vision and Language Representation Learning Dec 5, 2019 10-shot image generation Image Retrieval
Code Code Available 0VQA Therapy: Exploring Answer Differences by Visually Grounding Answers Aug 21, 2023 Question Answering Visual Question Answering
Code Code Available 0Single-Stream Multi-Level Alignment for Vision-Language Pretraining Mar 27, 2022 Image-text Retrieval Question Answering
Code Code Available 0Exploring the Potential of Encoder-free Architectures in 3D LMMs Feb 13, 2025 Inductive Bias Visual Question Answering (VQA)
Code Code Available 0Why do These Match? Explaining the Behavior of Image Similarity Models May 26, 2019 Attribute General Classification
Code Code Available 0Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment Jul 8, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Visual Question Answering: Datasets, Algorithms, and Future Challenges Oct 5, 2016 Question Answering Visual Question Answering
Code Code Available 0Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos Sep 21, 2022 Action Detection Action Recognition
Code Code Available 0Exploring Models and Data for Image Question Answering May 8, 2015 Image Segmentation object-detection
Code Code Available 0SlotPi: Physics-informed Object-centric Reasoning Models Jun 12, 2025 Object Question Answering
Code Code Available 0