| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 |
| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 |
| Nougat: Neural Optical Understanding for Academic Documents | Aug 25, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 5 |
| Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Aug 28, 2024 | Optical Character Recognition | CodeCode Available | 4 |
| OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning | Dec 31, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 4 |
| OCR-free Document Understanding Transformer | Nov 30, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 3 |
| Playing Non-Embedded Card-Based Games with Reinforcement Learning | Apr 7, 2025 | Board GamesDecision Making | CodeCode Available | 3 |
| DTrOCR: Decoder-only Transformer for Optical Character Recognition | Aug 30, 2023 | DecoderHandwritten Text Recognition | CodeCode Available | 2 |
| Text Detection Forgot About Document OCR | Oct 14, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models | May 13, 2023 | Key Information ExtractionNutrition | CodeCode Available | 2 |
| PP-OCR: A Practical Ultra Lightweight OCR System | Sep 21, 2020 | Computational EfficiencyOptical Character Recognition | CodeCode Available | 2 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System | Sep 7, 2021 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement | Apr 8, 2024 | BinarizationDocument Enhancement | CodeCode Available | 2 |
| GIT: A Generative Image-to-text Transformer for Vision and Language | May 27, 2022 | DecoderImage Captioning | CodeCode Available | 2 |
| Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness | Jun 1, 2022 | CPUdocument understanding | CodeCode Available | 2 |
| Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories | Jun 5, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 2 |
| IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling | Jan 6, 2023 | Link PredictionOptical Character Recognition | CodeCode Available | 2 |
| ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area | Aug 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders | Jun 10, 2020 | Cell SegmentationDenoising | CodeCode Available | 1 |
| Detection of Furigana Text in Images | Jul 8, 2022 | object-detectionObject Detection | CodeCode Available | 1 |
| FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents | May 27, 2019 | FormOptical Character Recognition | CodeCode Available | 1 |
| Digitizing Historical Balance Sheet Data: A Practitioner's Guide | Mar 31, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents | Apr 24, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |