| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model | Jun 24, 2021 | DecoderLanguage Modeling | —Unverified | 0 |
| TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text | May 12, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | CodeCode Available | 0 |
| Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA | Nov 14, 2019 | General ClassificationTextVQA | CodeCode Available | 0 |