OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Dec 3, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 2GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models May 13, 2023 Key Information Extraction Nutrition
Code Code Available 2PP-OCR: A Practical Ultra Lightweight OCR System Sep 21, 2020 Computational Efficiency Optical Character Recognition
Code Code Available 2NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement Apr 8, 2024 Binarization Document Enhancement
Code Code Available 2MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Jul 1, 2024 Benchmarking document understanding
Code Code Available 2A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding Jul 2, 2024 document understanding Key Information Extraction
Code Code Available 2LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Jun 29, 2023 16k Image Captioning
Code Code Available 2MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories Jun 5, 2025 Benchmarking Optical Character Recognition
Code Code Available 2Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action Dec 7, 2024 Depth Estimation Mathematical Reasoning
Code Code Available 2Visually Guided Generative Text-Layout Pre-training for Document Intelligence Mar 25, 2024 Document Classification document understanding
Code Code Available 2Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1Image-based table recognition: data, model, and evaluation Nov 25, 2019 Articles Decoder
Code Code Available 1Hespi: A pipeline for automatically detecting information from hebarium specimen sheets Oct 11, 2024 Handwritten Text Recognition HTR
Code Code Available 1FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents May 27, 2019 Form Optical Character Recognition
Code Code Available 1hmBERT: Historical Multilingual Language Models for Named Entity Recognition May 31, 2022 Language Modeling Language Modelling
Code Code Available 1Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents Aug 23, 2022 Optical Character Recognition (OCR) Table Extraction
Code Code Available 1A Benchmark and Dataset for Post-OCR text correction in Sanskrit Nov 15, 2022 Astronomy Optical Character Recognition (OCR)
Code Code Available 1A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector Feb 26, 2018 Data Augmentation License Plate Detection
Code Code Available 1Adapting OCR with limited supervision Jul 27, 2020 Optical Character Recognition (OCR)
Code Code Available 1HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions Sep 18, 2022 object-detection Object Detection
Code Code Available 1Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter Jun 10, 2021 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1Geometry Restoration and Dewarping of Camera-Captured Document Images Jan 6, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1GenPlot: Increasing the Scale and Diversity of Chart Derendering Data Jun 20, 2023 Derendering Diversity
Code Code Available 1German Parliamentary Corpus (GerParCor) Apr 21, 2022 Optical Character Recognition (OCR)
Code Code Available 1Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs Mar 15, 2021 Optical Character Recognition (OCR) Synthetic Data Generation
Code Code Available 1GenKIE: Robust Generative Multimodal Document Key Information Extraction Oct 24, 2023 Decoder Key Information Extraction
Code Code Available 1Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval Jul 1, 2020 Optical Character Recognition (OCR) Retrieval
Code Code Available 1A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition Dec 27, 2022 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 1Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark May 22, 2025 document understanding Multimodal Reasoning
Code Code Available 1From Text to Pixel: Advancing Long-Context Understanding in MLLMs May 23, 2024 Language Modeling Language Modelling
Code Code Available 1Improving accuracy and speeding up Document Image Classification through parallel systems Jun 16, 2020 Document Classification document-image-classification
Code Code Available 1Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation Oct 25, 2023 Handwritten Text Recognition Key Information Extraction
Code Code Available 1Efficient OCR for Building a Diverse Digital History Apr 5, 2023 Diversity Image Retrieval
Code Code Available 1Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach Aug 27, 2024 License Plate Recognition Optical Character Recognition
Code Code Available 1FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems Dec 15, 2020 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1DSG: An End-to-End Document Structure Generator Oct 13, 2023 Optical Character Recognition (OCR)
Code Code Available 1EAST: An Efficient and Accurate Scene Text Detector Apr 11, 2017 Curved Text Detection Optical Character Recognition (OCR)
Code Code Available 1End-to-End Information Extraction by Character-Level Embedding and Multi-Stage Attentional U-Net Jun 2, 2021 Optical Character Recognition (OCR)
Code Code Available 1Exploring Better Text Image Translation with Multimodal Codebook May 27, 2023 Machine Translation Optical Character Recognition
Code Code Available 1Exploring Cross-Image Pixel Contrast for Semantic Segmentation Jan 28, 2021 Metric Learning Optical Character Recognition (OCR)
Code Code Available 1Easter2.0: Improving convolutional models for handwritten text recognition May 30, 2022 Data Augmentation Few-Shot Learning
Code Code Available 1FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts Nov 9, 2023 Optical Character Recognition (OCR) Safety Alignment
Code Code Available 1DocScanner: Robust Document Image Rectification with Progressive Learning Oct 28, 2021 Optical Character Recognition (OCR)
Code Code Available 1DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction Dec 1, 2023 Optical Character Recognition (OCR)
Code Code Available 1DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction Oct 25, 2021 Optical Character Recognition (OCR)
Code Code Available 1DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding Jan 1, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 1