| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded | Feb 11, 2019 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation | Feb 26, 2025 | Question Answeringvalid | —Unverified | 0 |
| Task-driven Visual Saliency and Attention-based Visual Question Answering | Feb 22, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Task Formulation Matters When Learning Continuously: A Case Study in Visual Question Answering | Jan 16, 2022 | Continual LearningIncremental Learning | —Unverified | 0 |
| Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference | Mar 17, 2025 | Feature CompressionImage Compression | —Unverified | 0 |
| Task-Oriented Multi-User Semantic Communications | Dec 19, 2021 | Image RetrievalMachine Translation | —Unverified | 0 |
| Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks | May 5, 2025 | Question AnsweringSemantic Communication | —Unverified | 0 |
| Task Progressive Curriculum Learning for Robust Visual Question Answering | Nov 26, 2024 | Data AugmentationEnsemble Learning | —Unverified | 0 |
| TA-Student VQA: Multi-Agents Training by Self-Questioning | Jun 1, 2020 | DiversityQuestion Answering | —Unverified | 0 |
| Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions | Jan 27, 2018 | AttributeImage Captioning | —Unverified | 0 |
| Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding | Jun 21, 2022 | DecoderQuestion Answering | —Unverified | 0 |
| Test-Time Adaptation for Visual Document Understanding | Jun 15, 2022 | document understandingDomain Adaptation | —Unverified | 0 |
| Text-Aware Dual Routing Network for Visual Question Answering | Nov 17, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles | Jan 1, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering | Nov 24, 2024 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization | Dec 24, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 |
| TextSquare: Scaling up Text-Centric Visual Instruction Tuning | Apr 19, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| Textually Enriched Neural Module Networks for Visual Question Answering | Sep 23, 2018 | Image CaptioningQuestion Answering | —Unverified | 0 |
| TextVidBench: A Benchmark for Long Video Scene Text Understanding | Jun 5, 2025 | Prompt EngineeringQuestion Answering | —Unverified | 0 |
| The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA) | Sep 21, 2016 | Question AnsweringSentence | —Unverified | 0 |
| The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation | Nov 28, 2023 | DiversityQuestion Answering | —Unverified | 0 |
| The Forgettable-Watcher Model for Video Question Answering | May 3, 2017 | modelQuestion Answering | —Unverified | 0 |
| The Impact of Explanations on AI Competency Prediction in VQA | Jul 2, 2020 | AI AgentLanguage Modeling | —Unverified | 0 |
| The meaning of "most" for visual question answering models | Dec 31, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| The Meaning of ``Most'' for Visual Question Answering Models | Aug 1, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering | Jan 13, 2025 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions | Dec 16, 2016 | BIG-bench Machine LearningQuestion Answering | —Unverified | 0 |
| The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA | Sep 12, 2018 | Question AnsweringSemantic Similarity | —Unverified | 0 |
| TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems | Mar 9, 2025 | Multimodal Sentiment AnalysisQuestion Answering | —Unverified | 0 |
| TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices | Apr 4, 2024 | QuantizationQuestion Answering | —Unverified | 0 |
| TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering | Jul 16, 2024 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs | Apr 10, 2025 | Ensemble LearningPosition | —Unverified | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 |
| Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models | May 20, 2025 | Medical Visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering | Jan 25, 2023 | DecoderExplanation Generation | —Unverified | 0 |
| Towards Automated Error Analysis: Learning to Characterize Errors | Jan 13, 2022 | Common Sense ReasoningMeta-Learning | —Unverified | 0 |
| Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing | Dec 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 |
| Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation | Sep 10, 2021 | Knowledge DistillationQuestion Answering | —Unverified | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Jan 1, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | Aug 24, 2024 | knowledge editingOpen-Domain Question Answering | —Unverified | 0 |
| Towards Models that Can See and Read | Jan 18, 2023 | DecoderImage Captioning | —Unverified | 0 |
| Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method | May 20, 2025 | HallucinationObject Localization | —Unverified | 0 |
| Towards Reasoning-Aware Explainable VQA | Nov 9, 2022 | DecoderExplanation Generation | —Unverified | 0 |
| Towards Semantic Equivalence of Tokenization in Multimodal LLM | Jun 7, 2024 | Visual Question Answering | —Unverified | 0 |