| Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment | Feb 7, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 | 0 |
| Hyperbolic Attention Networks | May 24, 2018 | Machine TranslationQuestion Answering | —Unverified | 0 | 0 |
| Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end | Nov 28, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 | 0 |
| Bilinear Graph Networks for Visual Question Answering | Jul 23, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Hypo3D: Exploring Hypothetical Reasoning in 3D | Feb 2, 2025 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Graph Neural Networks in Vision-Language Image Understanding: A Survey | Mar 7, 2023 | Image CaptioningImage Retrieval | —Unverified | 0 | 0 |
| Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network | Sep 30, 2020 | Heuristic SearchQuestion Answering | —Unverified | 0 | 0 |
| ICDAR 2019 Competition on Scene Text Visual Question Answering | Jun 30, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| GRAM: Global Reasoning for Multi-Page VQA | Jan 7, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| i-Code Studio: A Configurable and Composable Framework for Integrative AI | May 23, 2023 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| GRADE: Quantifying Sample Diversity in Text-to-Image Models | Oct 29, 2024 | AttributeDiversity | —Unverified | 0 | 0 |
| GPT-4V Explorations: Mining Autonomous Driving | Jun 24, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 | 0 |
| Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing | Apr 8, 2020 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| What is needed for simple spatial language capabilities in VQA? | Aug 17, 2019 | DiagnosticQuestion Answering | —Unverified | 0 | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 | 0 |
| ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Dec 9, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Understanding the Role of Scene Graphs in Visual Question Answering | Jan 14, 2021 | Graph GenerationQuestion Answering | —Unverified | 0 | 0 |
| Goal-Oriented Semantic Communication for Wireless Visual Question Answering | Nov 3, 2024 | Edge-computingQuestion Answering | —Unverified | 0 | 0 |
| ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue | Sep 26, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension | Jul 1, 2017 | Question AnsweringReading Comprehension | —Unverified | 0 | 0 |
| CLIPPO: Image-and-Language Understanding from Pixels Only | Dec 15, 2022 | Contrastive Learningimage-classification | —Unverified | 0 | 0 |
| UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering | Dec 21, 2022 | Data AugmentationDecision Making | —Unverified | 0 | 0 |
| Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks | Jan 1, 2023 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 | 0 |
| Image Captioning and Visual Question Answering Based on Attributes and External Knowledge | Mar 9, 2016 | General KnowledgeImage Captioning | —Unverified | 0 | 0 |