SOTAVerified

Optical Character Recognition

Papers

Showing 150 of 526 papers

TitleStatusHype
Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis0
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends0
Logios : An open source Greek Polytonic Optical Character Recognition system0
Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages0
An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW0
Intelligent Automation for FDI Facilitation: Optimizing Tariff Exemption Processes with OCR And Large Language Models0
Task-driven real-world super-resolution of document scans0
Reading in the Dark with Foveated Event Vision0
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
SARD: A Large-Scale Synthetic Arabic OCR Dataset for Book-Style Text Recognition0
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance0
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning0
Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation0
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads0
Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR0
Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?Code1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
Low-Resource Language Processing: An OCR-Driven Summarization and Translation PipelineCode0
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto LanguageCode0
A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court0
Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late InteractionCode0
Development of a WAZOBIA-Named Entity Recognition System0
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction EncodingCode0
Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and BenchmarkCode0
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval0
ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints0
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR0
Relation-Rich Visual Document Generator for Visual Information ExtractionCode0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Towards Calibration Enhanced Network by Inverse Adversarial Attack0
Playing Non-Embedded Card-Based Games with Reinforcement LearningCode3
Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical DocumentsCode1
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity0
TFIC: End-to-End Text-Focused Image Compression for Coding for Machines0
AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates0
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document TranscriptionCode0
MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR TextsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Visual Graph Question Answering with ASP and LLMs for Language Parsing0
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video EnvironmentsCode1
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents0
LoCoML: A Framework for Real-World ML Inference Pipelines0
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images0
Comparative analysis of optical character recognition methods for Sámi texts from the National Library of NorwayCode0
Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line AnalysisCode0
Show:102550
← PrevPage 1 of 11Next →

No leaderboard results yet.