| Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method | Nov 1, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts | Oct 22, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs | Oct 13, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition | Oct 7, 2021 | Label Error DetectionOptical Character Recognition | CodeCode Available | 1 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 |
| TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models | Sep 21, 2021 | Handwritten Text RecognitionLanguage Modeling | CodeCode Available | 1 |
| Deep learning-based NLP Data Pipeline for EHR Scanned Document Information Extraction | Sep 14, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Post-OCR Document Correction with large Ensembles of Character Sequence-to-Sequence Models | Sep 13, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System | Sep 7, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| A Novel Machine Learning Based Approach for Post-OCR Error Detection | Sep 1, 2021 | BIG-bench Machine LearningOptical Character Recognition | —Unverified | 0 |
| OCR Processing of Swedish Historical Newspapers Using Deep Hybrid CNN–LSTM Networks | Sep 1, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| A Multimodal Framework for Video Ads Understanding | Aug 29, 2021 | MarketingOptical Character Recognition | —Unverified | 0 |
| Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling | Aug 20, 2021 | Data AblationOptical Character Recognition | —Unverified | 0 |
| VisBuddy -- A Smart Wearable Assistant for the Visually Challenged | Aug 17, 2021 | Image Captioningobject-detection | —Unverified | 0 |
| Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents | Aug 6, 2021 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining | Jul 15, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention | Jun 21, 2021 | DecoderOptical Character Recognition | —Unverified | 0 |
| Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences | Jun 20, 2021 | DecoderOptical Character Recognition | —Unverified | 0 |
| Classification of Documents Extracted from Images with Optical Character Recognition Methods | Jun 15, 2021 | BIG-bench Machine LearningOptical Character Recognition | —Unverified | 0 |
| Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning | Jun 15, 2021 | Data AugmentationOptical Character Recognition | —Unverified | 0 |
| Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter | Jun 10, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Classification of Contract-Amendment Relationships | Jun 8, 2021 | ClassificationManagement | —Unverified | 0 |
| PAM: Understanding Product Images in Cross Product Category Attribute Extraction | Jun 8, 2021 | AttributeAttribute Extraction | —Unverified | 0 |
| Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods | May 31, 2021 | ArticlesBIG-bench Machine Learning | —Unverified | 0 |
| Empirical Error Modeling Improves Robustness of Noisy Neural Sequence Labeling | May 25, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations | May 23, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Simple Transparent Adversarial Examples | May 20, 2021 | Image Generationobject-detection | —Unverified | 0 |
| End-to-End Unsupervised Document Image Blind Denoising | May 19, 2021 | DenoisingImage Denoising | —Unverified | 0 |
| STRIDE : Scene Text Recognition In-Device | May 17, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Unknown-box Approximation to Improve Optical Character Recognition Performance | May 17, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text | May 12, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Supporting Land Reuse of Former Open Pit Mining Sites using Text Classification and Active Learning | May 12, 2021 | Active LearningOptical Character Recognition | —Unverified | 0 |
| An end-to-end Optical Character Recognition approach for ultra-low-resolution printed text images | May 10, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| End-to-End Optical Character Recognition for Bengali Handwritten Words | May 9, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model | Apr 19, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Document Layout Analysis via Dynamic Residual Feature Fusion | Apr 7, 2021 | Document Layout AnalysisOptical Character Recognition | —Unverified | 0 |
| We Live in a Motorized Civilization: Robert Moses Replies to Robert Caro | Mar 26, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Interpretable Distance Metric Learning for Handwritten Chinese Character Recognition | Mar 17, 2021 | DiversityHandwriting Recognition | —Unverified | 0 |
| Combining Morphological and Histogram based Text Line Segmentation in the OCR Context | Mar 16, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images | Feb 20, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition | Feb 17, 2021 | Handwriting RecognitionHandwritten Text Recognition | CodeCode Available | 0 |
| Neural OCR Post-Hoc Correction of Historical Corpora | Feb 1, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports | Jan 22, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| On-Device Document Classification using multimodal features | Jan 6, 2021 | ClassificationDocument Classification | —Unverified | 0 |
| Iranis: A Large-scale Dataset of Farsi License Plate Characters | Jan 1, 2021 | image-classificationImage Classification | CodeCode Available | 1 |
| NOSE Augment: Fast and Effective Data Augmentation Without Searching | Jan 1, 2021 | Data AugmentationDiversity | —Unverified | 0 |
| ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition | Dec 23, 2020 | DecoderOptical Character Recognition | —Unverified | 0 |
| Indonesian ID Card Extractor Using Optical Character Recognition and Natural Language Post-Processing | Dec 15, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems | Dec 15, 2020 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Vartani Spellcheck -- Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance | Dec 14, 2020 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |