| SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild | Jan 6, 2025 | AttributeOptical Character Recognition | —Unverified | 0 |
| Geometry Restoration and Dewarping of Camera-Captured Document Images | Jan 6, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Efficient Video-Based ALPR System Using YOLO and Visual Rhythm | Jan 4, 2025 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 0 |
| Embedding Similarity Guided License Plate Super Resolution | Jan 2, 2025 | License Plate RecognitionOptical Character Recognition | —Unverified | 0 |
| CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR | Jan 1, 2025 | AllOptical Character Recognition | —Unverified | 0 |
| OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning | Dec 31, 2024 | BenchmarkingLogical Reasoning | CodeCode Available | 4 |
| Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions | Dec 29, 2024 | Data AugmentationImage Segmentation | —Unverified | 0 |
| Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Dec 29, 2024 | Motion DetectionOptical Character Recognition | CodeCode Available | 0 |
| ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing | Dec 24, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data | Dec 24, 2024 | Computational EfficiencyOptical Character Recognition | —Unverified | 0 |
| Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions | Dec 24, 2024 | Optical Character Recognition | —Unverified | 0 |
| LMV-RPA: Large Model Voting-based Robotic Process Automation | Dec 23, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages | Dec 14, 2024 | Machine TranslationOptical Character Recognition | CodeCode Available | 0 |
| Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma | Dec 14, 2024 | GPULicense Plate Recognition | —Unverified | 0 |
| Enhancement of text recognition for hanja handwritten documents of Ancient Korea | Dec 14, 2024 | Data Augmentationobject-detection | —Unverified | 0 |
| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 |
| POINTS1.5: Building a Vision-Language Model towards Real World Applications | Dec 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Aligned Music Notation and Lyrics Transcription | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Text Change Detection in Multilingual Documents Using Image Comparison | Dec 5, 2024 | BinarizationChange Detection | —Unverified | 0 |
| Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty | Dec 3, 2024 | Information RetrievalOptical Character Recognition | —Unverified | 0 |
| OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Dec 3, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 2 |
| AI-assisted summary of suicide risk Formulation | Nov 29, 2024 | Optical Character Recognition | —Unverified | 0 |
| Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting | Nov 18, 2024 | DiagnosticOptical Character Recognition | —Unverified | 0 |
| DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives | Nov 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models | Nov 7, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 |
| Handwriting Recognition in Historical Documents with Multimodal LLM | Oct 31, 2024 | Handwriting RecognitionOptical Character Recognition | —Unverified | 0 |
| Toxicity of the Commons: Curating Open-Source Pre-Training Data | Oct 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Are VLMs Really Blind | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation | Oct 15, 2024 | License Plate RecognitionOptical Character Recognition | —Unverified | 0 |
| ChartKG: A Knowledge-Graph-Based Representation for Chart Images | Oct 13, 2024 | Chart Question AnsweringKnowledge Graph Completion | —Unverified | 0 |
| MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions | Oct 13, 2024 | Handwriting RecognitionOptical Character Recognition | —Unverified | 0 |
| Hespi: A pipeline for automatically detecting information from hebarium specimen sheets | Oct 11, 2024 | Handwritten Text RecognitionHTR | CodeCode Available | 1 |
| JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers | Sep 30, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| See then Tell: Enhancing Key Information Extraction with Vision Grounding | Sep 29, 2024 | Image to textKey Information Extraction | —Unverified | 0 |
| CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials | Sep 27, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features | Sep 25, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology | Sep 21, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR | Sep 14, 2024 | 3D ClassificationOptical Character Recognition | —Unverified | 0 |
| ICDAR 2024 Competition on Few-Shot and Many-Shot Layout Segmentation of Ancient Manuscripts (SAM) | Sep 11, 2024 | DiversityDocument Layout Analysis | —Unverified | 0 |
| PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction | Sep 8, 2024 | Deep LearningDocument Layout Analysis | CodeCode Available | 0 |
| POINTS: Improving Your Vision-language Model with Affordable Strategies | Sep 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Confidence-Aware Document OCR Error Detection | Sep 6, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Post-OCR Text Correction for Bulgarian Historical Documents | Aug 31, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Aug 30, 2024 | Articlesnamed-entity-recognition | CodeCode Available | 0 |
| Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Aug 28, 2024 | Optical Character Recognition | CodeCode Available | 4 |
| Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail | Aug 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach | Aug 27, 2024 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 1 |
| FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting | Aug 27, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |