SOTAVerified

Optical Character Recognition

Papers

Showing 101150 of 526 papers

TitleStatusHype
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads0
Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR0
Low-Resource Language Processing: An OCR-Driven Summarization and Translation PipelineCode0
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto LanguageCode0
A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court0
Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late InteractionCode0
Development of a WAZOBIA-Named Entity Recognition System0
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction EncodingCode0
Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and BenchmarkCode0
ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints0
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval0
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR0
Relation-Rich Visual Document Generator for Visual Information ExtractionCode0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Towards Calibration Enhanced Network by Inverse Adversarial Attack0
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity0
TFIC: End-to-End Text-Focused Image Compression for Coding for Machines0
AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates0
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document TranscriptionCode0
MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR TextsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language modelsCode0
Visual Graph Question Answering with ASP and LLMs for Language Parsing0
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents0
LoCoML: A Framework for Real-World ML Inference Pipelines0
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images0
Comparative analysis of optical character recognition methods for Sámi texts from the National Library of NorwayCode0
Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line AnalysisCode0
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild0
Efficient Video-Based ALPR System Using YOLO and Visual RhythmCode0
Embedding Similarity Guided License Plate Super Resolution0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions0
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary StudyCode0
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data0
Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions0
ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing0
LMV-RPA: Large Model Voting-based Robotic Process AutomationCode0
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource ScriptsCode0
Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma0
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari LanguagesCode0
Enhancement of text recognition for hanja handwritten documents of Ancient Korea0
POINTS1.5: Building a Vision-Language Model towards Real World Applications0
Aligned Music Notation and Lyrics TranscriptionCode0
Text Change Detection in Multilingual Documents Using Image Comparison0
Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty0
AI-assisted summary of suicide risk Formulation0
Show:102550
← PrevPage 3 of 11Next →

No leaderboard results yet.