| TA-Student VQA: Multi-Agents Training by Self-Questioning | Jun 1, 2020 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation | Jan 1, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Bayesian Attention Belief Networks | Jun 9, 2021 | DecoderMachine Translation | —Unverified | 0 | 0 |
| 3D Concept Learning and Reasoning from Multi-View Images | Mar 20, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering | Jul 28, 2023 | Question AnsweringVietnamese Visual Question Answering | —Unverified | 0 | 0 |
| Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions | Jan 27, 2018 | AttributeImage Captioning | —Unverified | 0 | 0 |
| Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding | Jun 21, 2022 | DecoderQuestion Answering | —Unverified | 0 | 0 |
| VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis | Jun 19, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights | Jul 9, 2025 | DiagnosticMedical Visual Question Answering | —Unverified | 0 | 0 |
| Test-Time Adaptation for Visual Document Understanding | Jun 15, 2022 | document understandingDomain Adaptation | —Unverified | 0 | 0 |
| Text-Aware Dual Routing Network for Visual Question Answering | Nov 17, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses | Dec 11, 2024 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles | Jan 1, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering | Nov 24, 2024 | Question AnsweringRelational Reasoning | —Unverified | 0 | 0 |
| Balancing Performance and Efficiency in Zero-shot Robotic Navigation | Jun 5, 2024 | Computational EfficiencyQuestion Answering | —Unverified | 0 | 0 |
| BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs | Jul 3, 2024 | Image CaptioningImage Generation | —Unverified | 0 | 0 |
| TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization | Dec 24, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 | 0 |
| TextSquare: Scaling up Text-Centric Visual Instruction Tuning | Apr 19, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 | 0 |
| Textually Enriched Neural Module Networks for Visual Question Answering | Sep 23, 2018 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| TextVidBench: A Benchmark for Long Video Scene Text Understanding | Jun 5, 2025 | Prompt EngineeringQuestion Answering | —Unverified | 0 | 0 |
| VQABQ: Visual Question Answering by Basic Questions | Mar 19, 2017 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Backdooring Vision-Language Models with Out-Of-Distribution Data | Oct 2, 2024 | Image CaptioningImage to text | —Unverified | 0 | 0 |
| A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning | Nov 3, 2024 | object-detectionObject Detection | —Unverified | 0 | 0 |
| The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA) | Sep 21, 2016 | Question AnsweringSentence | —Unverified | 0 | 0 |