| Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method | Nov 1, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts | Oct 22, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs | Oct 13, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition | Oct 7, 2021 | Label Error DetectionOptical Character Recognition | CodeCode Available | 1 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 |
| TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models | Sep 21, 2021 | Handwritten Text RecognitionLanguage Modeling | CodeCode Available | 1 |
| Deep learning-based NLP Data Pipeline for EHR Scanned Document Information Extraction | Sep 14, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Post-OCR Document Correction with large Ensembles of Character Sequence-to-Sequence Models | Sep 13, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System | Sep 7, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| A Novel Machine Learning Based Approach for Post-OCR Error Detection | Sep 1, 2021 | BIG-bench Machine LearningOptical Character Recognition | —Unverified | 0 |