| Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge | Jan 1, 2023 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding | Sep 1, 2023 | Graph GenerationImage Captioning | CodeCode Available | 0 |
| Dual Attention Networks for Multimodal Reasoning and Matching | Nov 2, 2016 | Collaborative InferenceImage-text matching | CodeCode Available | 0 |
| Recommending Themes for Ad Creative Design via Visual-Linguistic Representations | Jan 20, 2020 | Question AnsweringRecommendation Systems | CodeCode Available | 0 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Recursive Visual Attention in Visual Dialog | Dec 6, 2018 | Question AnsweringVisual Dialog | CodeCode Available | 0 |
| Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Jul 22, 2024 | DisentanglementQuestion Answering | CodeCode Available | 0 |
| ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping | Jan 6, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving | Jul 18, 2023 | Autonomous DrivingModel Selection | CodeCode Available | 0 |
| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |