| Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting | Nov 18, 2024 | DiagnosticOptical Character Recognition | —Unverified | 0 |
| DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives | Nov 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models | Nov 7, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 |
| Handwriting Recognition in Historical Documents with Multimodal LLM | Oct 31, 2024 | Handwriting RecognitionOptical Character Recognition | —Unverified | 0 |
| Are VLMs Really Blind | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation | Oct 15, 2024 | License Plate RecognitionOptical Character Recognition | —Unverified | 0 |
| ChartKG: A Knowledge-Graph-Based Representation for Chart Images | Oct 13, 2024 | Chart Question AnsweringKnowledge Graph Completion | —Unverified | 0 |
| MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions | Oct 13, 2024 | Handwriting RecognitionOptical Character Recognition | —Unverified | 0 |
| JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers | Sep 30, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| See then Tell: Enhancing Key Information Extraction with Vision Grounding | Sep 29, 2024 | Image to textKey Information Extraction | —Unverified | 0 |
| CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials | Sep 27, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features | Sep 25, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Sep 21, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR | Sep 14, 2024 | 3D ClassificationOptical Character Recognition | —Unverified | 0 |
| ICDAR 2024 Competition on Few-Shot and Many-Shot Layout Segmentation of Ancient Manuscripts (SAM) | Sep 11, 2024 | DiversityDocument Layout Analysis | —Unverified | 0 |
| PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction | Sep 8, 2024 | Deep LearningDocument Layout Analysis | CodeCode Available | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Confidence-Aware Document OCR Error Detection | Sep 6, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Post-OCR Text Correction for Bulgarian Historical Documents | Aug 31, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Aug 30, 2024 | Articlesnamed-entity-recognition | CodeCode Available | 0 |
| Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail | Aug 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text | Aug 27, 2024 | Data AugmentationOptical Character Recognition | —Unverified | 0 |
| FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting | Aug 27, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation | Aug 27, 2024 | Information RetrievalInstance Segmentation | —Unverified | 0 |
| Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset | Aug 24, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Large Language Models for Page Stream Segmentation | Aug 21, 2024 | DecoderOptical Character Recognition | —Unverified | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| Handwritten Code Recognition for Pen-and-Paper CS Education | Aug 7, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter | Jul 30, 2024 | MisinformationOptical Character Recognition | CodeCode Available | 0 |
| Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation | Jul 26, 2024 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema | Jul 26, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips | Jul 22, 2024 | object-detectionObject Detection | —Unverified | 0 |
| Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition | Jul 18, 2024 | DecoderHandwriting Recognition | —Unverified | 0 |
| Task-driven single-image super-resolution reconstruction of document scans | Jul 12, 2024 | Image Super-ResolutionOptical Character Recognition | —Unverified | 0 |
| Toward accessible comics for blind and low vision readers | Jul 11, 2024 | Optical Character RecognitionPrompt Engineering | —Unverified | 0 |
| Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation | Jul 9, 2024 | DecoderImage Generation | CodeCode Available | 0 |
| High-Throughput Phenotyping using Computer Vision and Machine Learning | Jul 8, 2024 | Image SegmentationOptical Character Recognition | CodeCode Available | 0 |
| Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR Technologies | Jul 5, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription | Jun 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst | Jun 14, 2024 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation | Jun 12, 2024 | Document Level Machine TranslationDocument Translation | CodeCode Available | 0 |
| Scaling Automatic Extraction of Pseudocode | Jun 7, 2024 | Code GenerationOptical Character Recognition | —Unverified | 0 |
| Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement | Jun 3, 2024 | Jersey Number RecognitionMulti-Task Learning | —Unverified | 0 |
| Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities | May 25, 2024 | Boundary DetectionOptical Character Recognition | —Unverified | 0 |
| Transfer Learning Approach for Railway Technical Map (RTM) Component Identification | May 21, 2024 | Managementobject-detection | —Unverified | 0 |
| GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | May 6, 2024 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents | Apr 30, 2024 | 8kDiversity | CodeCode Available | 0 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |