| CogVLM: Visual Expert for Pretrained Language Models | Nov 6, 2023 | 1 Image, 2*2 StitchingFS-MEVQA | CodeCode Available | 5 |
| Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA | Oct 13, 2023 | Graph LearningObject | —Unverified | 0 |
| Sentence Attention Blocks for Answer Grounding | Sep 20, 2023 | Question AnsweringSentence | —Unverified | 0 |
| Separate and Locate: Rethink the Text in Text-based Visual Question Answering | Aug 31, 2023 | Optical Character Recognition (OCR)Position | CodeCode Available | 0 |
| Making the V in Text-VQA Matter | Aug 1, 2023 | Optical Character Recognition (OCR)TextVQA | —Unverified | 0 |
| Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA | Apr 4, 2023 | Answer GenerationLanguage Modelling | —Unverified | 0 |
| SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering | Dec 16, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 |
| TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation | Aug 3, 2022 | Answer GenerationQuestion-Answer-Generation | CodeCode Available | 1 |
| Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering | Mar 24, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| LaTr: Layout-Aware Transformer for Scene-Text VQA | Dec 23, 2021 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 1 |
| Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture | Nov 11, 2021 | Graph AttentionQuestion Answering | —Unverified | 0 |
| Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model | Jun 24, 2021 | DecoderLanguage Modeling | —Unverified | 0 |
| TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text | May 12, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations | Apr 29, 2021 | TextVQA | CodeCode Available | 1 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | CodeCode Available | 0 |
| TAP: Text-Aware Pre-training for Text-VQA and Text-Caption | Dec 8, 2020 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering | Oct 24, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 |
| Structured Multimodal Attentions for TextVQA | Jun 1, 2020 | Graph AttentionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA | Nov 14, 2019 | General ClassificationTextVQA | CodeCode Available | 0 |
| Towards VQA Models That Can Read | Apr 18, 2019 | TextVQAVisual Question Answering (VQA) | CodeCode Available | 3 |