SOTAVerified

Optical Character Recognition

Papers

Showing 76100 of 526 papers

TitleStatusHype
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
Handwriting Recognition in Historical Documents with Multimodal LLM0
Toxicity of the Commons: Curating Open-Source Pre-Training DataCode1
Are VLMs Really BlindCode0
Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation0
ChartKG: A Knowledge-Graph-Based Representation for Chart Images0
MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions0
Hespi: A pipeline for automatically detecting information from hebarium specimen sheetsCode1
JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials0
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual FeaturesCode0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR0
ICDAR 2024 Competition on Few-Shot and Many-Shot Layout Segmentation of Ancient Manuscripts (SAM)0
PdfTable: A Unified Toolkit for Deep Learning-Based Table ExtractionCode0
POINTS: Improving Your Vision-language Model with Affordable Strategies0
Confidence-Aware Document OCR Error Detection0
Post-OCR Text Correction for Bulgarian Historical DocumentsCode0
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language ModelsCode0
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of EncodersCode4
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail0
Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven ApproachCode1
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
Show:102550
← PrevPage 4 of 22Next →

No leaderboard results yet.