HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Dec 18, 2023 Question Answering Visual Question Answering
Code Code Available 1HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Greedy Gradient Ensemble for Robust Visual Question Answering Jul 27, 2021 Question Answering Visual Question Answering
Code Code Available 1Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 1GRIT: General Robust Image Task Benchmark Apr 28, 2022 Instance Segmentation Keypoint Detection
Code Code Available 1GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering Apr 20, 2021 Graph Neural Network Graph Question Answering
Code Code Available 1Cross-modal Retrieval for Knowledge-based Visual Question Answering Jan 11, 2024 Cross-Modal Retrieval Question Answering
Code Code Available 1Combo of Thinking and Observing for Outside-Knowledge VQA May 10, 2023 Decoder Question Answering
Code Code Available 1Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering Jul 13, 2021 Navigate Question Answering
Code Code Available 1Hierarchical Conditional Relation Networks for Video Question Answering Feb 25, 2020 Audio-Visual Question Answering (AVQA) Question Answering
Code Code Available 1Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Feb 18, 2021 Decoder Document Image Classification
Code Code Available 1GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering Feb 25, 2019 Question Answering Visual Question Answering (VQA)
Code Code Available 1MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Dec 6, 2024 Multimodal Reasoning Visual Question Answering
Code Code Available 1Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture Nov 22, 2021 Handwritten Text Recognition object-detection
Code Code Available 1MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1MapQA: A Dataset for Question Answering on Choropleth Maps Nov 15, 2022 Articles Question Answering
Code Code Available 1Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Mar 29, 2021 Decoder Image Segmentation
Code Code Available 1MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks May 18, 2025 Benchmarking Medical Visual Question Answering
Code Code Available 1MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts May 18, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution May 27, 2025 8k Avg
Code Code Available 1MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models Sep 23, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models Mar 23, 2023 Auxiliary Learning Multimodal Sentiment Analysis
Code Code Available 1Meta-Learning via Classifier(-free) Diffusion Guidance Oct 17, 2022 Few-Shot Learning Image Generation
Code Code Available 1ConceptBert: Concept-Aware Representation for Visual Question Answering Nov 1, 2020 Common Sense Reasoning Question Answering
Code Code Available 1Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 1Cross-Modality Relevance for Reasoning on Language and Vision May 12, 2020 Question Answering Visual Question Answering
Code Code Available 1OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge May 31, 2019 object-detection Object Detection
Code Code Available 1Consistency-preserving Visual Question Answering in Medical Imaging Jun 27, 2022 Question Answering Visual Question Answering
Code Code Available 1Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency Feb 6, 2025 Video Generation Video Quality Assessment
Code Code Available 1Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 1ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 1Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Oct 7, 2016 General Classification Image Attribution
Code Code Available 1Hierarchical multimodal transformers for Multi-Page DocVQA Dec 7, 2022 Decoder Question Answering
Code Code Available 1Contrast and Classify: Training Robust VQA Models Oct 13, 2020 Contrastive Learning Data Augmentation
Code Code Available 12BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos Aug 31, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Interpreting Chest X-rays Like a Radiologist: A Benchmark with Clinical Reasoning May 29, 2025 Diagnostic Question Answering
Code Code Available 1FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 1Detecting and Preventing Hallucinations in Large Vision Language Models Aug 11, 2023 16k Hallucination
Code Code Available 1MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models Feb 16, 2025 Language Modeling Language Modelling
Code Code Available 1Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations Feb 10, 2024 Diagnostic Hallucination
Code Code Available 1Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Feb 23, 2023 Open-Domain Question Answering Question Answering
Code Code Available 1Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering Oct 3, 2021 counterfactual Diagnostic
Code Code Available 1Counterfactual Samples Synthesizing for Robust Visual Question Answering Mar 14, 2020 counterfactual Question Answering
Code Code Available 1A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge Jun 3, 2022 Question Answering Visual Question Answering
Code Code Available 1Counterfactual VQA: A Cause-Effect Look at Language Bias Jun 8, 2020 Causal Inference counterfactual
Code Code Available 1From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis Jun 28, 2024 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 1GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 1FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Jun 16, 2024 Diversity Multiple-choice
Code Code Available 1Can I Trust Your Answer? Visually Grounded Video Question Answering Sep 4, 2023 Grounded Video Question Answering Question Answering
Code Code Available 1