| Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models | Mar 24, 2025 | MMETextVQA | CodeCode Available | 0 | 5 |
| Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA | Nov 14, 2019 | General ClassificationTextVQA | CodeCode Available | 0 | 5 |
| Towards a Unified Multimodal Reasoning Framework | Dec 22, 2023 | Multimodal ReasoningMultiple-choice | CodeCode Available | 0 | 5 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | CodeCode Available | 0 | 5 |
| Separate and Locate: Rethink the Text in Text-based Visual Question Answering | Aug 31, 2023 | Optical Character Recognition (OCR)Position | CodeCode Available | 0 | 5 |
| Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues | Dec 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization | Feb 12, 2024 | In-Context LearningTextVQA | CodeCode Available | 0 | 5 |
| InstructOCR: Instruction Boosting Scene Text Spotting | Dec 20, 2024 | Optical Character Recognition (OCR)Text Spotting | CodeCode Available | 0 | 5 |
| Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model | Jun 24, 2021 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Analysing the Robustness of Vision-Language-Models to Common Corruptions | Apr 18, 2025 | TextVQA | —Unverified | 0 | 0 |