| Towards a Unified Multimodal Reasoning Framework | Dec 22, 2023 | Multimodal ReasoningMultiple-choice | CodeCode Available | 0 |
| Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering | Mar 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| InstructOCR: Instruction Boosting Scene Text Spotting | Dec 20, 2024 | Optical Character Recognition (OCR)Text Spotting | CodeCode Available | 0 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | CodeCode Available | 0 |
| Separate and Locate: Rethink the Text in Text-based Visual Question Answering | Aug 31, 2023 | Optical Character Recognition (OCR)Position | CodeCode Available | 0 |
| OmniFusion Technical Report | Apr 9, 2024 | MM-VetTextVQA | CodeCode Available | 0 |
| Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA | Nov 14, 2019 | General ClassificationTextVQA | CodeCode Available | 0 |