| How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads | May 21, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR | May 20, 2025 | ArticlesImage Super-Resolution | —Unverified | 0 |
| Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline | May 16, 2025 | Abstractive Text SummarizationLanguage Modeling | CodeCode Available | 0 |
| PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language | May 15, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court | May 13, 2025 | DiversityDocument Layout Analysis | —Unverified | 0 |
| Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction | May 12, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Development of a WAZOBIA-Named Entity Recognition System | May 10, 2025 | Machine Translationnamed-entity-recognition | —Unverified | 0 |
| Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding | May 9, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark | May 9, 2025 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 0 |
| ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints | May 8, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval | May 8, 2025 | Computational EfficiencyOptical Character Recognition | —Unverified | 0 |
| DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation | May 7, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer | May 2, 2025 | document understandingHallucination | —Unverified | 0 |
| Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models | Apr 16, 2025 | document understandingLayout Design | CodeCode Available | 0 |
| Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR | Apr 15, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Relation-Rich Visual Document Generator for Visual Information Extraction | Apr 14, 2025 | Diversitydocument understanding | CodeCode Available | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| Towards Calibration Enhanced Network by Inverse Adversarial Attack | Apr 8, 2025 | Adversarial AttackOptical Character Recognition | —Unverified | 0 |
| Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity | Mar 31, 2025 | Image CaptioningOptical Character Recognition | —Unverified | 0 |
| TFIC: End-to-End Text-Focused Image Compression for Coding for Machines | Mar 25, 2025 | Image CompressionOptical Character Recognition | —Unverified | 0 |
| AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates | Mar 5, 2025 | Anomaly DetectionDefect Detection | —Unverified | 0 |
| Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Feb 27, 2025 | Handwritten Text RecognitionHTR | CodeCode Available | 0 |
| MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts | Feb 24, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding | Feb 20, 2025 | document understandingOptical Character Recognition | —Unverified | 0 |
| Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models | Feb 18, 2025 | Image to textOptical Character Recognition | CodeCode Available | 0 |
| Visual Graph Question Answering with ASP and LLMs for Language Parsing | Feb 13, 2025 | Graph Question AnsweringOptical Character Recognition | —Unverified | 0 |
| Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents | Feb 6, 2025 | Image CaptioningOptical Character Recognition | —Unverified | 0 |
| LoCoML: A Framework for Real-World ML Inference Pipelines | Jan 24, 2025 | Automatic Speech RecognitionMachine Translation | —Unverified | 0 |
| Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images | Jan 16, 2025 | De-identificationOptical Character Recognition | —Unverified | 0 |
| Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway | Jan 13, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line Analysis | Jan 8, 2025 | License Plate DetectionLicense Plate Recognition | CodeCode Available | 0 |
| SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild | Jan 6, 2025 | AttributeOptical Character Recognition | —Unverified | 0 |
| Efficient Video-Based ALPR System Using YOLO and Visual Rhythm | Jan 4, 2025 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 0 |
| Embedding Similarity Guided License Plate Super Resolution | Jan 2, 2025 | License Plate RecognitionOptical Character Recognition | —Unverified | 0 |
| CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR | Jan 1, 2025 | AllOptical Character Recognition | —Unverified | 0 |
| Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions | Dec 29, 2024 | Data AugmentationImage Segmentation | —Unverified | 0 |
| Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Dec 29, 2024 | Motion DetectionOptical Character Recognition | CodeCode Available | 0 |
| VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data | Dec 24, 2024 | Computational EfficiencyOptical Character Recognition | —Unverified | 0 |
| Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions | Dec 24, 2024 | Optical Character Recognition | —Unverified | 0 |
| ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing | Dec 24, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| LMV-RPA: Large Model Voting-based Robotic Process Automation | Dec 23, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma | Dec 14, 2024 | GPULicense Plate Recognition | —Unverified | 0 |
| RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages | Dec 14, 2024 | Machine TranslationOptical Character Recognition | CodeCode Available | 0 |
| Enhancement of text recognition for hanja handwritten documents of Ancient Korea | Dec 14, 2024 | Data Augmentationobject-detection | —Unverified | 0 |
| POINTS1.5: Building a Vision-Language Model towards Real World Applications | Dec 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Aligned Music Notation and Lyrics Transcription | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Text Change Detection in Multilingual Documents Using Image Comparison | Dec 5, 2024 | BinarizationChange Detection | —Unverified | 0 |
| Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty | Dec 3, 2024 | Information RetrievalOptical Character Recognition | —Unverified | 0 |
| AI-assisted summary of suicide risk Formulation | Nov 29, 2024 | Optical Character Recognition | —Unverified | 0 |