SOTAVerified

Optical Character Recognition

Papers

Showing 151200 of 526 papers

TitleStatusHype
Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting0
DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language ArchivesCode0
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
Handwriting Recognition in Historical Documents with Multimodal LLM0
Are VLMs Really BlindCode0
Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation0
ChartKG: A Knowledge-Graph-Based Representation for Chart Images0
MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions0
JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials0
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual FeaturesCode0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR0
ICDAR 2024 Competition on Few-Shot and Many-Shot Layout Segmentation of Ancient Manuscripts (SAM)0
PdfTable: A Unified Toolkit for Deep Learning-Based Table ExtractionCode0
POINTS: Improving Your Vision-language Model with Affordable Strategies0
Confidence-Aware Document OCR Error Detection0
Post-OCR Text Correction for Bulgarian Historical DocumentsCode0
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language ModelsCode0
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail0
A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation0
Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese0
Large Language Models for Page Stream Segmentation0
Revisiting Multi-Modal LLM Evaluation0
Handwritten Code Recognition for Pen-and-Paper CS EducationCode0
PIXELMOD: Improving Soft Moderation of Visual Misleading Information on TwitterCode0
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation0
ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema0
PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips0
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition0
Task-driven single-image super-resolution reconstruction of document scans0
Toward accessible comics for blind and low vision readers0
Spanish TrOCR: Leveraging Transfer Learning for Language AdaptationCode0
High-Throughput Phenotyping using Computer Vision and Machine LearningCode0
Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR TechnologiesCode0
Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription0
OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst0
M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine TranslationCode0
Scaling Automatic Extraction of Pseudocode0
Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement0
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities0
Transfer Learning Approach for Railway Technical Map (RTM) Component Identification0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingCode0
DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical DocumentsCode0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Show:102550
← PrevPage 4 of 11Next →

No leaderboard results yet.