| The meaning of "most" for visual question answering models | Dec 31, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| The Meaning of ``Most'' for Visual Question Answering Models | Aug 1, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering | Jan 13, 2025 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions | Dec 16, 2016 | BIG-bench Machine LearningQuestion Answering | —Unverified | 0 |
| The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA | Sep 12, 2018 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems | Mar 9, 2025 | Multimodal Sentiment AnalysisQuestion Answering | —Unverified | 0 |
| TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices | Apr 4, 2024 | QuantizationQuestion Answering | —Unverified | 0 |
| TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering | Jul 16, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs | Apr 10, 2025 | Ensemble LearningPosition | —Unverified | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 |
| Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models | May 20, 2025 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering | Jan 25, 2023 | DecoderExplanation Generation | —Unverified | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 |
| Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing | Dec 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 |
| Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation | Sep 10, 2021 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | Aug 24, 2024 | knowledge editingOpen-Domain Question Answering | —Unverified | 0 |
| Towards Models that Can See and Read | Jan 18, 2023 | DecoderImage Captioning | —Unverified | 0 |
| Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method | May 20, 2025 | HallucinationObject Localization | —Unverified | 0 |
| Towards Reasoning-Aware Explainable VQA | Nov 9, 2022 | DecoderExplanation Generation | —Unverified | 0 |
| Towards Semantic Equivalence of Tokenization in Multimodal LLM | Jun 7, 2024 | Visual Question Answering | —Unverified | 0 |