| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 |
| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 |
| Nougat: Neural Optical Understanding for Academic Documents | Aug 25, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 5 |
| OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning | Dec 31, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 4 |
| Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Aug 28, 2024 | Optical Character Recognition | CodeCode Available | 4 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 |
| OCR-free Document Understanding Transformer | Nov 30, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 3 |
| MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories | Jun 5, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 2 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area | Aug 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement | Apr 8, 2024 | BinarizationDocument Enhancement | CodeCode Available | 2 |
| DTrOCR: Decoder-only Transformer for Optical Character Recognition | Aug 30, 2023 | DecoderHandwritten Text Recognition | CodeCode Available | 2 |
| OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models | May 13, 2023 | Key Information ExtractionNutrition | CodeCode Available | 2 |
| IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling | Jan 6, 2023 | Link PredictionOptical Character Recognition | CodeCode Available | 2 |
| Text Detection Forgot About Document OCR | Oct 14, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness | Jun 1, 2022 | CPUdocument understanding | CodeCode Available | 2 |
| GIT: A Generative Image-to-text Transformer for Vision and Language | May 27, 2022 | DecoderImage Captioning | CodeCode Available | 2 |
| PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System | Sep 7, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| PP-OCR: A Practical Ultra Lightweight OCR System | Sep 21, 2020 | Computational EfficiencyOptical Character Recognition | CodeCode Available | 2 |
| Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | May 29, 2025 | Handwritten Mathmatical Expression RecognitionLanguage Modeling | CodeCode Available | 1 |
| Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? | May 19, 2025 | Logical ReasoningOptical Character Recognition | CodeCode Available | 1 |
| LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? | May 18, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents | Apr 1, 2025 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments | Feb 10, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 1 |
| Geometry Restoration and Dewarping of Camera-Captured Document Images | Jan 6, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Toxicity of the Commons: Curating Open-Source Pre-Training Data | Oct 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Hespi: A pipeline for automatically detecting information from hebarium specimen sheets | Oct 11, 2024 | Handwritten Text RecognitionHTR | CodeCode Available | 1 |
| Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach | Aug 27, 2024 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 1 |
| Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval | Aug 1, 2024 | AttributeOptical Character Recognition | CodeCode Available | 1 |
| VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset | Jun 6, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images | Apr 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents | Mar 23, 2024 | ArticlesOptical Character Recognition | CodeCode Available | 1 |
| ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | Mar 1, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| An Empirical Study of Scaling Law for OCR | Dec 29, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Data Generation for Post-OCR correction of Cyrillic handwriting | Nov 27, 2023 | Handwriting generationHandwritten Text Recognition | CodeCode Available | 1 |
| Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation | Oct 25, 2023 | Handwritten Text RecognitionKey Information Extraction | CodeCode Available | 1 |
| GenKIE: Robust Generative Multimodal Document Key Information Extraction | Oct 24, 2023 | DecoderKey Information Extraction | CodeCode Available | 1 |
| Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks | Oct 8, 2023 | BinarizationCPU | CodeCode Available | 1 |
| bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents | Aug 21, 2023 | distortion correctionOptical Character Recognition | CodeCode Available | 1 |
| OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation | Aug 8, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition | Aug 4, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data | Jul 23, 2023 | Optical Character RecognitionZero-Shot Learning | CodeCode Available | 1 |
| T-MARS: Improving Visual Representations by Circumventing Text Feature Learning | Jul 6, 2023 | Optical Character Recognition | CodeCode Available | 1 |
| TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain | Jun 3, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| Exploring Better Text Image Translation with Multimodal Codebook | May 27, 2023 | Machine TranslationOptical Character Recognition | CodeCode Available | 1 |
| Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers | May 27, 2023 | Image Super-ResolutionLicense Plate Recognition | CodeCode Available | 1 |
| DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents | Apr 24, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Efficient OCR for Building a Diverse Digital History | Apr 5, 2023 | DiversityImage Retrieval | CodeCode Available | 1 |