CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 15 Cross-Modality Relevance for Reasoning on Language and Vision May 12, 2020 Question Answering Visual Question Answering
Code Code Available 15 Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations Dec 8, 2022 Explanation Generation Visual Entailment
Code Code Available 15 CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions Dec 8, 2020 counterfactual Descriptive
Code Code Available 15 Counterfactual VQA: A Cause-Effect Look at Language Bias Jun 8, 2020 Causal Inference counterfactual
Code Code Available 15 An Empirical Study of Training End-to-End Vision-and-Language Transformers Nov 3, 2021 Cross-Modal Retrieval Decoder
Code Code Available 15 Cross-modal Retrieval for Knowledge-based Visual Question Answering Jan 11, 2024 Cross-Modal Retrieval Question Answering
Code Code Available 15 HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Dec 18, 2023 Question Answering Visual Question Answering
Code Code Available 15 HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 15 An Empirical Study of Multimodal Model Merging Apr 28, 2023 model Retrieval
Code Code Available 15 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Sep 10, 2021 Image Captioning Question Answering
Code Code Available 15 COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 15 An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 15 A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering Nov 13, 2023 Decision Making Explanation Generation
Code Code Available 15 Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering Oct 3, 2021 counterfactual Diagnostic
Code Code Available 15 Greedy Gradient Ensemble for Robust Visual Question Answering Jul 27, 2021 Question Answering Visual Question Answering
Code Code Available 15 Counterfactual Samples Synthesizing for Robust Visual Question Answering Mar 14, 2020 counterfactual Question Answering
Code Code Available 15 3D-Aware Visual Question Answering about Parts, Poses and Occlusions Oct 27, 2023 Question Answering Visual Question Answering
Code Code Available 15 Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering Jul 13, 2021 Navigate Question Answering
Code Code Available 15 An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models Nov 9, 2024 object-detection Object Detection
Code Code Available 15 Visual Grounding Methods for VQA are Working for the Wrong Reasons! Apr 12, 2020 Question Answering Visual Grounding
Code Code Available 15 A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports Sep 3, 2020 Image-text Retrieval Medical Visual Question Answering
Code Code Available 15 Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 15 GRIT: General Robust Image Task Benchmark Apr 28, 2022 Instance Segmentation Keypoint Detection
Code Code Available 15 Hierarchical Conditional Relation Networks for Video Question Answering Feb 25, 2020 Audio-Visual Question Answering (AVQA) Question Answering
Code Code Available 15 ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 15 Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Feb 18, 2021 Decoder Document Image Classification
Code Code Available 15 GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering Feb 25, 2019 Question Answering Visual Question Answering (VQA)
Code Code Available 15 Consistency-preserving Visual Question Answering in Medical Imaging Jun 27, 2022 Question Answering Visual Question Answering
Code Code Available 15 Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency Feb 6, 2025 Video Generation Video Quality Assessment
Code Code Available 15 Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models Jul 26, 2023 Image Quality Assessment No-Reference Image Quality Assessment
Code Code Available 15 Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 15 GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution May 27, 2025 8k Avg
Code Code Available 15 Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Oct 7, 2016 General Classification Image Attribution
Code Code Available 15 Generative Bias for Robust Visual Question Answering Aug 1, 2022 Knowledge Distillation Question Answering
Code Code Available 15 AMD-Hummingbird: Towards an Efficient Text-to-Video Model Mar 24, 2025 Computational Efficiency Video Generation
Code Code Available 15 A Dataset and Baselines for Visual Question Answering on Art Aug 28, 2020 Question Answering Question Generation
Code Code Available 15 Compositional Attention Networks for Machine Reasoning Mar 8, 2018 Referring Expression Comprehension Visual Question Answering (VQA)
Code Code Available 15 Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 15 Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers Mar 29, 2021 Decoder Image Segmentation
Code Code Available 15 FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 15 Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations Feb 10, 2024 Diagnostic Hallucination
Code Code Available 15 ConceptBert: Concept-Aware Representation for Visual Question Answering Nov 1, 2020 Common Sense Reasoning Question Answering
Code Code Available 15 Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment Feb 21, 2024 Language Modelling Question Answering
Code Code Available 15 Contrast and Classify: Training Robust VQA Models Oct 13, 2020 Contrastive Learning Data Augmentation
Code Code Available 15 2BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos Aug 31, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 15 Combo of Thinking and Observing for Outside-Knowledge VQA May 10, 2023 Decoder Question Answering
Code Code Available 15 Attention in Reasoning: Dataset, Analysis, and Modeling Apr 20, 2022 Question Answering Visual Question Answering
Code Code Available 15 GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 15 Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 15