SOTAVerified

Optical Character Recognition (OCR)

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Papers

Showing 551600 of 1209 papers

TitleStatusHype
Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math TextbooksCode0
Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts0
Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition0
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary0
A comparative analysis of SRGAN models0
Handwritten and Printed Text Segmentation: A Signature Case Study0
Handwritten Text Recognition Using Convolutional Neural Network0
A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing0
Artificial Eye for the Blind0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding0
Estimating Post-OCR Denoising Complexity on Numerical Texts0
Fraunhofer SIT at CheckThat! 2023: Mixing Single-Modal Classifiers to Estimate the Check-Worthiness of Multi-Modal Tweets0
Resume Information Extraction via Post-OCR Text Processing0
A Survey on Multimodal Large Language Models0
Document Image Cleaning using Budget-Aware Black-Box ApproximationCode0
When Vision Fails: Text Attacks Against ViT and OCRCode0
Weakly supervised information extraction from inscrutable handwritten document images0
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure CaptioningCode0
Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents0
Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model0
DuoSearch: A Novel Search Engine for Bulgarian Historical DocumentsCode0
A template-independent approach for information extraction in real estate documentsCode0
People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts0
Quantifying Character Similarity with Vision TransformersCode0
DUBLIN -- Document Understanding By Language-Image Network0
Measuring Intersectional Biases in Historical DocumentsCode0
TextDiffuser: Diffusion Models as Text Painters0
Mobile User Interface Element Detection Via Adaptively Prompt TuningCode0
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding0
Combining OCR Models for Reading Early Modern Printed BooksCode0
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine TranslationCode0
Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation0
Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections0
ICDAR 2023 Competition on Reading the Seal Title0
Multimodal Short Video Rumor Detection System Based on Contrastive Learning0
TransDocs: Optical Character Recognition with word to word translationCode0
Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts0
Linking Representations with Multimodal Contrastive Learning0
A semi-automatic method for document classification in the shipping industry0
OVeNet: Offset Vector Network for Semantic SegmentationCode0
CLIP-ReIdent: Contrastive Training for Player Re-Identification0
Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh0
The System Description of dun_oscar team for The ICPR MSR Challenge0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Facial Embedding0
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-trainingCode0
Language Is Not All You Need: Aligning Perception with Language Models0
User-Centric Evaluation of OCR Systems for Kwak'wala0
An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning0
SPARLING: Learning Latent Representations with Extremely Sparse Activations0
Show:102550
← PrevPage 12 of 25Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DTrOCRAccuracy (%)89.6Unverified
2DTrOCR 105MAccuracy (%)89.6Unverified
3MaskOCR-LAccuracy (%)82.6Unverified
4TransOCRAccuracy (%)72.8Unverified
5SRNAccuracy (%)65Unverified
6MORANAccuracy (%)64.3Unverified
7SEEDAccuracy (%)61.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4oAverage Accuracy76.22Unverified
2Gemini-1.5 ProAverage Accuracy76.13Unverified
3Claude-3 SonnetAverage Accuracy67.71Unverified
4RapidOCRAverage Accuracy56.98Unverified
5EasyOCRAverage Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1STREETSequence error27.54Unverified
2SEESequence error22Unverified
3AttentionOCR_Inception-resnet-v2_LocationSequence error15.8Unverified
#ModelMetricClaimedVerifiedStatus
1I2L-NOPOOLBLEU89.09Unverified
2I2L-STRIPSBLEU89Unverified
#ModelMetricClaimedVerifiedStatus
1TesseractCharacter Error Rate (CER)0.08Unverified
2EasyOCRCharacter Error Rate (CER)0.07Unverified
#ModelMetricClaimedVerifiedStatus
1I2L-STRIPSBLEU88.86Unverified