OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models May 13, 2023 Key Information Extraction Nutrition
Code Code Available 25 GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation Mar 31, 2023 Image Generation Optical Character Recognition (OCR)
Code Code Available 25 GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 25 GlyphControl: Glyph Conditional Control for Visual Text Generation May 29, 2023 Optical Character Recognition (OCR) Text Generation
Code Code Available 25 Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction Nov 19, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 25 A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding Jul 2, 2024 document understanding Key Information Extraction
Code Code Available 25 MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Jul 1, 2024 Benchmarking document understanding
Code Code Available 25 MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Oct 3, 2023 Chatbot Image Captioning
Code Code Available 25 Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness Jun 1, 2022 CPU document understanding
Code Code Available 25 Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories Jun 5, 2025 Benchmarking Optical Character Recognition
Code Code Available 25 MouSi: Poly-Visual-Expert Vision-Language Models Jan 30, 2024 Image Segmentation Image-text matching
Code Code Available 25 Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 15 Image-based table recognition: data, model, and evaluation Nov 25, 2019 Articles Decoder
Code Code Available 15 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps Dec 7, 2020 Image Captioning Optical Character Recognition
Code Code Available 15 FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents May 27, 2019 Form Optical Character Recognition
Code Code Available 15 hmBERT: Historical Multilingual Language Models for Named Entity Recognition May 31, 2022 Language Modeling Language Modelling
Code Code Available 15 CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models Apr 3, 2024 Optical Character Recognition (OCR) speech-recognition
Code Code Available 15 A Benchmark and Dataset for Post-OCR text correction in Sanskrit Nov 15, 2022 Astronomy Optical Character Recognition (OCR)
Code Code Available 15 HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions Sep 18, 2022 object-detection Object Detection
Code Code Available 15 Combining Morphological and Histogram based Text Line Segmentation in the OCR Context Mar 16, 2021 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 15 Adapting OCR with limited supervision Jul 27, 2020 Optical Character Recognition (OCR)
Code Code Available 15 Hespi: A pipeline for automatically detecting information from hebarium specimen sheets Oct 11, 2024 Handwritten Text Recognition HTR
Code Code Available 15 Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter Jun 10, 2021 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 15 CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks Jun 11, 2020 Optical Character Recognition (OCR) Text Detection
Code Code Available 15 ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages Mar 26, 2024 Machine Reading Comprehension Optical Character Recognition (OCR)
Code Code Available 15 German Parliamentary Corpus (GerParCor) Apr 21, 2022 Optical Character Recognition (OCR)
Code Code Available 15 GenKIE: Robust Generative Multimodal Document Key Information Extraction Oct 24, 2023 Decoder Key Information Extraction
Code Code Available 15 ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules Apr 5, 2023 Chart Understanding Derendering
Code Code Available 15 GenPlot: Increasing the Scale and Diversity of Chart Derendering Data Jun 20, 2023 Derendering Diversity
Code Code Available 15 Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval Jul 1, 2020 Optical Character Recognition (OCR) Retrieval
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs Mar 15, 2021 Optical Character Recognition (OCR) Synthetic Data Generation
Code Code Available 15 Geometry Restoration and Dewarping of Camera-Captured Document Images Jan 6, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 15 Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents Aug 23, 2022 Optical Character Recognition (OCR) Table Extraction
Code Code Available 15 Improving accuracy and speeding up Document Image Classification through parallel systems Jun 16, 2020 Document Classification document-image-classification
Code Code Available 15 FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding Jul 6, 2024 Optical Character Recognition (OCR) Visual Question Answering (VQA)
Code Code Available 15 FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems Dec 15, 2020 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 15 FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts Nov 9, 2023 Optical Character Recognition (OCR) Safety Alignment
Code Code Available 15 Exploring Better Text Image Translation with Multimodal Codebook May 27, 2023 Machine Translation Optical Character Recognition
Code Code Available 15 Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments Feb 10, 2025 Benchmarking Optical Character Recognition
Code Code Available 15 Exploring Cross-Image Pixel Contrast for Semantic Segmentation Jan 28, 2021 Metric Learning Optical Character Recognition (OCR)
Code Code Available 15 BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents Aug 10, 2021 Key Information Extraction Language Modeling
Code Code Available 15 bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents Aug 21, 2023 distortion correction Optical Character Recognition
Code Code Available 15 Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation Oct 25, 2023 Handwritten Text Recognition Key Information Extraction
Code Code Available 15 ClusterTabNet: Supervised clustering method for table detection and table structure recognition Feb 12, 2024 Clustering Optical Character Recognition (OCR)
Code Code Available 15 Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 15 Easter2.0: Improving convolutional models for handwritten text recognition May 30, 2022 Data Augmentation Few-Shot Learning
Code Code Available 15 A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition Dec 27, 2022 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 15 EAST: An Efficient and Accurate Scene Text Detector Apr 11, 2017 Curved Text Detection Optical Character Recognition (OCR)
Code Code Available 15