Just Ask: Learning to Answer Questions from Millions of Narrated Videos Dec 1, 2020 Question Answering Question Generation
Code Code Available 1LIVE: Learnable In-Context Vector for Visual Question Answering Jun 19, 2024 In-Context Learning Question Answering
Code Code Available 1Improving Selective Visual Question Answering by Learning from Your Peers Jun 14, 2023 Question Answering Visual Question Answering
Code Code Available 1IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models Mar 23, 2024 Common Sense Reasoning In-Context Learning
Code Code Available 1In Defense of Grid Features for Visual Question Answering Jan 10, 2020 Image Captioning Question Answering
Code Code Available 1I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 1How to Configure Good In-Context Sequence for Visual Question Answering Dec 4, 2023 In-Context Learning Question Answering
Code Code Available 1How Much Can CLIP Benefit Vision-and-Language Tasks? Jul 13, 2021 Question Answering Vision and Language Navigation
Code Code Available 1IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning Oct 25, 2021 Arithmetic Reasoning Mathematical Question Answering
Code Code Available 1Hierarchical multimodal transformers for Multi-Page DocVQA Dec 7, 2022 Decoder Question Answering
Code Code Available 1HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Hierarchical Question-Image Co-Attention for Visual Question Answering May 31, 2016 Visual Dialog Visual Question Answering
Code Code Available 1HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Dec 18, 2023 Question Answering Visual Question Answering
Code Code Available 1Dynamic Language Binding in Relational Visual Reasoning Apr 30, 2020 Object Question Answering
Code Code Available 1Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Nov 27, 2023 Adversarial Robustness Visual Question Answering (VQA)
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering Feb 25, 2019 Question Answering Visual Question Answering (VQA)
Code Code Available 1GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering Apr 20, 2021 Graph Neural Network Graph Question Answering
Code Code Available 1End-to-end Knowledge Retrieval with Multi-modal Queries Jun 1, 2023 Benchmarking Cross-Modal Retrieval
Code Code Available 1Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering Jul 13, 2021 Navigate Question Answering
Code Code Available 1Are Bias Mitigation Techniques for Deep Learning Effective? Apr 1, 2021 Deep Learning Question Answering
Code Code Available 1Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Jul 25, 2017 Image Captioning Visual Question Answering
Code Code Available 1DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering Jul 10, 2021 Graph Attention Question Answering
Code Code Available 1End-to-end Document Recognition and Understanding with Dessurt Mar 30, 2022 document understanding Visual Question Answering (VQA)
Code Code Available 1Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Oct 7, 2016 General Classification Image Attribution
Code Code Available 1Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Feb 18, 2021 Decoder Document Image Classification
Code Code Available 1GRIT: General Robust Image Task Benchmark Apr 28, 2022 Instance Segmentation Keypoint Detection
Code Code Available 1Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 1HallE-Control: Controlling Object Hallucination in Large Multimodal Models Oct 3, 2023 Attribute Decoder
Code Code Available 1EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering Dec 19, 2023 Object Object Counting
Code Code Available 1Break It Down: A Question Understanding Benchmark Jan 31, 2020 Open-Domain Question Answering Question Answering
Code Code Available 1Hierarchical Conditional Relation Networks for Video Question Answering Feb 25, 2020 Audio-Visual Question Answering (AVQA) Question Answering
Code Code Available 1Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Mar 29, 2021 Decoder Image Segmentation
Code Code Available 1GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution May 27, 2025 8k Avg
Code Code Available 1HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Mar 19, 2024 Reinforcement Learning (RL) Visual Grounding
Code Code Available 1Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering Apr 22, 2022 Question Answering Visual Question Answering
Code Code Available 1Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA Feb 24, 2024 3D Question Answering (3D-QA) Question Answering
Code Code Available 1IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages Jan 27, 2022 Cross-Modal Retrieval Few-Shot Learning
Code Code Available 1IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents Dec 10, 2024 Cross-Modal Retrieval Image Classification
Code Code Available 1Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training Nov 23, 2023 Multimodal Reasoning Science Question Answering
Code Code Available 1Dual-Key Multimodal Backdoors for Visual Question Answering Dec 14, 2021 Question Answering Visual Question Answering
Code Code Available 1EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images Oct 28, 2023 Decision Making Medical Visual Question Answering
Code Code Available 1Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1eP-ALM: Efficient Perceptual Augmentation of Language Models Mar 20, 2023 In-Context Learning Visual Question Answering (VQA)
Code Code Available 1FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Mar 27, 2025 Attribute Benchmarking
Code Code Available 1InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Dec 21, 2023 Image Retrieval Image-to-Text Retrieval
Code Code Available 1Introspective Distillation for Robust Question Answering Nov 1, 2021 counterfactual Inductive Bias
Code Code Available 1Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering Sep 19, 2024 Hallucination Hallucination Evaluation
Code Code Available 1