SOTAVerified

Optical Character Recognition

Papers

Showing 51100 of 526 papers

TitleStatusHype
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild0
Geometry Restoration and Dewarping of Camera-Captured Document ImagesCode1
Efficient Video-Based ALPR System Using YOLO and Visual RhythmCode0
Embedding Similarity Guided License Plate Super Resolution0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and ReasoningCode4
Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions0
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary StudyCode0
ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing0
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data0
Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions0
LMV-RPA: Large Model Voting-based Robotic Process AutomationCode0
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource ScriptsCode0
RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari LanguagesCode0
Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma0
Enhancement of text recognition for hanja handwritten documents of Ancient Korea0
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
POINTS1.5: Building a Vision-Language Model towards Real World Applications0
Aligned Music Notation and Lyrics TranscriptionCode0
Text Change Detection in Multilingual Documents Using Image Comparison0
Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty0
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented GenerationCode2
AI-assisted summary of suicide risk Formulation0
Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting0
DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language ArchivesCode0
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
Handwriting Recognition in Historical Documents with Multimodal LLM0
Toxicity of the Commons: Curating Open-Source Pre-Training DataCode1
Are VLMs Really BlindCode0
Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation0
ChartKG: A Knowledge-Graph-Based Representation for Chart Images0
MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions0
Hespi: A pipeline for automatically detecting information from hebarium specimen sheetsCode1
JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials0
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual FeaturesCode0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR0
ICDAR 2024 Competition on Few-Shot and Many-Shot Layout Segmentation of Ancient Manuscripts (SAM)0
PdfTable: A Unified Toolkit for Deep Learning-Based Table ExtractionCode0
POINTS: Improving Your Vision-language Model with Affordable Strategies0
Confidence-Aware Document OCR Error Detection0
Post-OCR Text Correction for Bulgarian Historical DocumentsCode0
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language ModelsCode0
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of EncodersCode4
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail0
Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven ApproachCode1
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
Show:102550
← PrevPage 2 of 11Next →

No leaderboard results yet.