| Mitigating Object Hallucinations via Sentence-Level Early Intervention | Jul 16, 2025 | HallucinationMM-Vet | CodeCode Available | 1 |
| TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation | Aug 3, 2022 | Answer GenerationQuestion-Answer-Generation | CodeCode Available | 1 |
| LaTr: Layout-Aware Transformer for Scene-Text VQA | Dec 23, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 1 |
| A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations | Apr 29, 2021 | TextVQA | CodeCode Available | 1 |
| TAP: Text-Aware Pre-training for Text-VQA and Text-Caption | Dec 8, 2020 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering | Oct 24, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 |
| Structured Multimodal Attentions for TextVQA | Jun 1, 2020 | Graph AttentionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance | May 29, 2025 | Image Super-ResolutionOptical Character Recognition | —Unverified | 0 |
| EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models | May 28, 2025 | Mixture-of-ExpertsMME | —Unverified | 0 |