GlyphControl: Glyph Conditional Control for Visual Text Generation May 29, 2023 Optical Character Recognition (OCR) Text Generation
Code Code Available 2OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models May 13, 2023 Key Information Extraction Nutrition
Code Code Available 2GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation Mar 31, 2023 Image Generation Optical Character Recognition (OCR)
Code Code Available 2IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling Jan 6, 2023 Link Prediction Optical Character Recognition
Code Code Available 2Text Detection Forgot About Document OCR Oct 14, 2022 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 2Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Oct 7, 2022 Chart Question Answering Diversity
Code Code Available 2When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition Jul 23, 2022 Decoder Handwritten Mathmatical Expression Recognition
Code Code Available 2Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness Jun 1, 2022 CPU document understanding
Code Code Available 2GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System Sep 7, 2021 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 2PP-OCR: A Practical Ultra Lightweight OCR System Sep 21, 2020 Computational Efficiency Optical Character Recognition
Code Code Available 2Real-time Scene Text Detection with Differentiable Binarization Nov 20, 2019 Binarization Optical Character Recognition (OCR)
Code Code Available 2Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition May 29, 2025 Handwritten Mathmatical Expression Recognition Language Modeling
Code Code Available 1ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge May 28, 2025 Imitation Learning Math
Code Code Available 1VidText: Towards Comprehensive Evaluation for Video Text Understanding May 28, 2025 Multimodal Reasoning Optical Character Recognition (OCR)
Code Code Available 1Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging May 26, 2025 Language Modeling Language Modelling
Code Code Available 1ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models May 25, 2025 Optical Character Recognition (OCR) Reading Comprehension
Code Code Available 1ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark May 22, 2025 document understanding Multimodal Reasoning
Code Code Available 1Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? May 19, 2025 Logical Reasoning Optical Character Recognition
Code Code Available 1LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? May 18, 2025 Logical Reasoning Multimodal Reasoning
Code Code Available 1Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents Apr 1, 2025 named-entity-recognition Named Entity Recognition
Code Code Available 1PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model Mar 24, 2025 Language Modeling Language Modelling
Code Code Available 1Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments Feb 10, 2025 Benchmarking Optical Character Recognition
Code Code Available 1Towards Making Flowchart Images Machine Interpretable Jan 29, 2025 Code Generation Optical Character Recognition (OCR)
Code Code Available 1Ocean-OCR: Towards General OCR Application via a Vision-Language Model Jan 26, 2025 document understanding Language Modeling
Code Code Available 1MathReader : Text-to-Speech for Mathematical Documents Jan 13, 2025 Optical Character Recognition (OCR) text-to-speech
Code Code Available 1Geometry Restoration and Dewarping of Camera-Captured Document Images Jan 6, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding Jan 1, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 1Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Nov 16, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
Code Code Available 1Toxicity of the Commons: Curating Open-Source Pre-Training Data Oct 29, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition Oct 13, 2024 Domain Adaptation Optical Character Recognition (OCR)
Code Code Available 1Hespi: A pipeline for automatically detecting information from hebarium specimen sheets Oct 11, 2024 Handwritten Text Recognition HTR
Code Code Available 1One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks Sep 20, 2024 All Dependency Parsing
Code Code Available 1Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach Aug 27, 2024 License Plate Recognition Optical Character Recognition
Code Code Available 1DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding Aug 27, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 1Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Jul 17, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 1FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding Jul 6, 2024 Optical Character Recognition (OCR) Visual Question Answering (VQA)
Code Code Available 1CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset Jun 6, 2024 object-detection Object Detection
Code Code Available 1From Text to Pixel: Advancing Long-Context Understanding in MLLMs May 23, 2024 Language Modeling Language Modelling
Code Code Available 1ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images Apr 29, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models Apr 3, 2024 Optical Character Recognition (OCR) speech-recognition
Code Code Available 1ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages Mar 26, 2024 Machine Reading Comprehension Optical Character Recognition (OCR)
Code Code Available 1PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents Mar 23, 2024 Articles Optical Character Recognition
Code Code Available 1ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting Mar 1, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming Feb 15, 2024 Optical Character Recognition (OCR) Text Detection
Code Code Available 1ClusterTabNet: Supervised clustering method for table detection and table structure recognition Feb 12, 2024 Clustering Optical Character Recognition (OCR)
Code Code Available 1An Empirical Study of Scaling Law for OCR Dec 29, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning Dec 16, 2023 Optical Character Recognition (OCR)
Code Code Available 1