SOTAVerified

Optical Character Recognition (OCR)

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Papers

Showing 101150 of 1209 papers

TitleStatusHype
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document TranscriptionCode0
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts0
Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language ModelsCode0
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI0
MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR TextsCode0
Visual Zero-Shot E-Commerce Product Attribute Value Extraction0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Harnessing PDF Data for Improving Japanese Large Multimodal Models0
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Southern Newswire Corpus: A Large-Scale Dataset of Mid-Century Wire Articles Beyond the Front Page0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
Adapting Multilingual Embedding Models to Historical Luxembourgish0
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video EnvironmentsCode1
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents0
Towards Making Flowchart Images Machine InterpretableCode1
MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark0
Ocean-OCR: Towards General OCR Application via a Vision-Language ModelCode1
Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical recordsCode0
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents0
Jochre 3 and the Yiddish OCR corpusCode0
MathReader : Text-to-Speech for Mathematical DocumentsCode1
Comparative analysis of optical character recognition methods for Sámi texts from the National Library of NorwayCode0
Centurio: On Drivers of Multilingual Ability of Large Vision-Language ModelCode0
Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line AnalysisCode0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations0
Geometry Restoration and Dewarping of Camera-Captured Document ImagesCode1
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild0
Efficient Video-Based ALPR System Using YOLO and Visual RhythmCode0
Emergency-Brake Simplex: Toward A Verifiably Safe Control-CPS Architecture for Abrupt Runtime Reachability Constraint Changes0
Crossing Language Borders: A Pipeline for Indonesian Manhwa TranslationCode0
Embedding Similarity Guided License Plate Super Resolution0
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark0
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
2.5 Years in Class: A Multimodal Textbook for Vision-Language PretrainingCode2
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and ReasoningCode4
Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions0
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary StudyCode0
ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing0
HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images0
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data0
LMV-RPA: Large Model Voting-based Robotic Process AutomationCode0
InstructOCR: Instruction Boosting Scene Text SpottingCode0
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource ScriptsCode0
TextSleuth: Towards Explainable Tampered Text Detection0
Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal CluesCode0
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment0
Advanced ingestion process powered by LLM parsing for RAG system0
Show:102550
← PrevPage 3 of 25Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DTrOCRAccuracy (%)89.6Unverified
2DTrOCR 105MAccuracy (%)89.6Unverified
3MaskOCR-LAccuracy (%)82.6Unverified
4TransOCRAccuracy (%)72.8Unverified
5SRNAccuracy (%)65Unverified
6MORANAccuracy (%)64.3Unverified
7SEEDAccuracy (%)61.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4oAverage Accuracy76.22Unverified
2Gemini-1.5 ProAverage Accuracy76.13Unverified
3Claude-3 SonnetAverage Accuracy67.71Unverified
4RapidOCRAverage Accuracy56.98Unverified
5EasyOCRAverage Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1STREETSequence error27.54Unverified
2SEESequence error22Unverified
3AttentionOCR_Inception-resnet-v2_LocationSequence error15.8Unverified
#ModelMetricClaimedVerifiedStatus
1I2L-NOPOOLBLEU89.09Unverified
2I2L-STRIPSBLEU89Unverified
#ModelMetricClaimedVerifiedStatus
1TesseractCharacter Error Rate (CER)0.08Unverified
2EasyOCRCharacter Error Rate (CER)0.07Unverified
#ModelMetricClaimedVerifiedStatus
1I2L-STRIPSBLEU88.86Unverified