Greedy Gradient Ensemble for Robust Visual Question Answering Jul 27, 2021 Question Answering Visual Question Answering
Code Code Available 1Change Detection Meets Visual Question Answering Dec 12, 2021 Answer Generation Change Detection
Code Code Available 1Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering May 25, 2025 Anatomy Benchmarking
Code Code Available 1GRIT: General Robust Image Task Benchmark Apr 28, 2022 Instance Segmentation Keypoint Detection
Code Code Available 1Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Oct 7, 2016 General Classification Image Attribution
Code Code Available 1GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering Feb 25, 2019 Question Answering Visual Question Answering (VQA)
Code Code Available 1GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering Apr 20, 2021 Graph Neural Network Graph Question Answering
Code Code Available 1Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Feb 18, 2021 Decoder Document Image Classification
Code Code Available 1A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Oct 16, 2021 Image Captioning Language Modeling
Code Code Available 1Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 1Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering Jul 13, 2021 Navigate Question Answering
Code Code Available 1HallE-Control: Controlling Object Hallucination in Large Multimodal Models Oct 3, 2023 Attribute Decoder
Code Code Available 1Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 1Check It Again: Progressive Visual Question Answering via Visual Entailment Jun 8, 2021 Question Answering Visual Entailment
Code Code Available 1Check It Again:Progressive Visual Question Answering via Visual Entailment Aug 1, 2021 Question Answering Visual Entailment
Code Code Available 1ChipQA: No-Reference Video Quality Prediction via Space-Time Chips Sep 17, 2021 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding Aug 5, 2022 Image Retrieval Question Answering
Code Code Available 1Hierarchical multimodal transformers for Multi-Page DocVQA Dec 7, 2022 Decoder Question Answering
Code Code Available 1How to Configure Good In-Context Sequence for Visual Question Answering Dec 4, 2023 In-Context Learning Question Answering
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1Classification-Regression for Chart Comprehension Nov 29, 2021 Chart Question Answering Classification
Code Code Available 1I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 1Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation Dec 22, 2021 Common Sense Reasoning Question Answering
Code Code Available 1CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Dec 20, 2016 Diagnostic Question Answering
Code Code Available 1Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 1Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Mar 29, 2021 Decoder Image Segmentation
Code Code Available 1GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution May 27, 2025 8k Avg
Code Code Available 1ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models Oct 7, 2024 Question Answering Visual Question Answering
Code Code Available 1ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 1CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations Apr 5, 2022 Explanation Generation Question Answering
Code Code Available 1AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM Nov 26, 2024 Benchmarking Text-to-Video Generation
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 1AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results Aug 21, 2024 Image Manipulation valid
Code Code Available 1Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering Oct 3, 2021 counterfactual Diagnostic
Code Code Available 1Introspective Distillation for Robust Question Answering Nov 1, 2021 counterfactual Inductive Bias
Code Code Available 1Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Jun 16, 2023 Image Captioning Question Answering
Code Code Available 1Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning Jun 11, 2020 Question Answering Reinforcement Learning (RL)
Code Code Available 1Clover: Towards A Unified Video-Language Alignment and Fusion Model Jul 16, 2022 Language Modeling Language Modelling
Code Code Available 1Coarse-to-Fine Reasoning for Visual Question Answering Oct 6, 2021 Question Answering Visual Question Answering
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1Counterfactual Samples Synthesizing for Robust Visual Question Answering Mar 14, 2020 counterfactual Question Answering
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery Jul 11, 2023 Question Answering Scene Understanding
Code Code Available 1GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 1COBRA: Contrastive Bi-Modal Representation Algorithm May 7, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 1CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 1Counterfactual VQA: A Cause-Effect Look at Language Bias Jun 8, 2020 Causal Inference counterfactual
Code Code Available 1Generative Bias for Robust Visual Question Answering Aug 1, 2022 Knowledge Distillation Question Answering
Code Code Available 1