SOTAVerified

Optical Character Recognition

Papers

Showing 51100 of 526 papers

TitleStatusHype
Persis: A Persian Font Recognition Pipeline Using Convolutional Neural NetworksCode1
Geometry Restoration and Dewarping of Camera-Captured Document ImagesCode1
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in ImagesCode1
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech RecognitionCode1
T-MARS: Improving Visual Representations by Circumventing Text Feature LearningCode1
On the Cross-dataset Generalization in License Plate RecognitionCode1
FUNSD: A Dataset for Form Understanding in Noisy Scanned DocumentsCode1
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text RetrievalCode1
bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali DocumentsCode1
Implicit Feature Alignment: Learn to Convert Text Recognizer to Text SpotterCode1
Iranis: A Large-scale Dataset of Farsi License Plate CharactersCode1
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution LayersCode1
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and SpottingCode1
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding DatasetCode1
Data Generation for Post-OCR correction of Cyrillic handwritingCode1
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and RecognitionCode1
Operationalizing a National Digital Library: The Case for a Norwegian Transformer ModelCode1
Multi-Type-TD-TSR -- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table RepresentationsCode1
Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical DocumentsCode1
An Empirical Study of Scaling Law for OCRCode1
A Two-Step Approach for Automatic OCR Post-CorrectionCode1
A Large Multi-Target Dataset of Common Bengali Handwritten GraphemesCode1
BankNote-Net: Open dataset for assistive universal currency recognitionCode1
Confidence-aware Non-repetitive Multimodal Transformers for TextCapsCode1
Neural OCR Post-Hoc Correction of Historical CorporaCode1
MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling CorrectionCode1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
Detection of Furigana Text in ImagesCode1
Combining Morphological and Histogram based Text Line Segmentation in the OCR ContextCode1
Fully Unsupervised Diversity Denoising with Convolutional Variational AutoencodersCode1
DocParser: End-to-end OCR-free Information Extraction from Visually Rich DocumentsCode1
Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documentsCode1
Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image ClassificationCode1
PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific DocumentsCode1
Boosting on the shoulders of giants in quantum device calibrationCode1
Toxicity of the Commons: Curating Open-Source Pre-Training DataCode1
Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven ApproachCode1
It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug ReportsCode0
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document TranscriptionCode0
A Study of Autoregressive Decoders for Multi-Tasking in Computer VisionCode0
ASTER: An Attentional Scene Text Recognizer with Flexible RectificationCode0
A Skip-connected Multi-column Network for Isolated Handwritten Bangla Character and Digit recognitionCode0
iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and RecognitionCode0
High-Throughput Phenotyping using Computer Vision and Machine LearningCode0
IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical Character RecognitionCode0
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction EncodingCode0
Are VLMs Really BlindCode0
Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer LearningCode0
A model of diffuse Galactic Radio Emission from 10 MHz to 100 GHzCode0
Show:102550
← PrevPage 2 of 11Next →

No leaderboard results yet.