SOTAVerified

Optical Character Recognition

Papers

Showing 2650 of 526 papers

TitleStatusHype
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval0
ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints0
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR0
Relation-Rich Visual Document Generator for Visual Information ExtractionCode0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Towards Calibration Enhanced Network by Inverse Adversarial Attack0
Playing Non-Embedded Card-Based Games with Reinforcement LearningCode3
Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical DocumentsCode1
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity0
TFIC: End-to-End Text-Focused Image Compression for Coding for Machines0
AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates0
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document TranscriptionCode0
MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR TextsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Visual Graph Question Answering with ASP and LLMs for Language Parsing0
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video EnvironmentsCode1
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents0
LoCoML: A Framework for Real-World ML Inference Pipelines0
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images0
Comparative analysis of optical character recognition methods for Sámi texts from the National Library of NorwayCode0
Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line AnalysisCode0
Show:102550
← PrevPage 2 of 22Next →

No leaderboard results yet.