| How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads | May 21, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR | May 20, 2025 | ArticlesImage Super-Resolution | —Unverified | 0 |
| Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline | May 16, 2025 | Abstractive Text SummarizationLanguage Modeling | CodeCode Available | 0 |
| PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language | May 15, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court | May 13, 2025 | DiversityDocument Layout Analysis | —Unverified | 0 |
| Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction | May 12, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Development of a WAZOBIA-Named Entity Recognition System | May 10, 2025 | Machine Translationnamed-entity-recognition | —Unverified | 0 |
| Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark | May 9, 2025 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 0 |
| Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding | May 9, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints | May 8, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval | May 8, 2025 | Computational EfficiencyOptical Character Recognition | —Unverified | 0 |
| DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation | May 7, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer | May 2, 2025 | document understandingHallucination | —Unverified | 0 |
| Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models | Apr 16, 2025 | document understandingLayout Design | CodeCode Available | 0 |
| Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR | Apr 15, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Relation-Rich Visual Document Generator for Visual Information Extraction | Apr 14, 2025 | Diversitydocument understanding | CodeCode Available | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| Towards Calibration Enhanced Network by Inverse Adversarial Attack | Apr 8, 2025 | Adversarial AttackOptical Character Recognition | —Unverified | 0 |
| Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity | Mar 31, 2025 | Image CaptioningOptical Character Recognition | —Unverified | 0 |
| TFIC: End-to-End Text-Focused Image Compression for Coding for Machines | Mar 25, 2025 | Image CompressionOptical Character Recognition | —Unverified | 0 |
| AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates | Mar 5, 2025 | Anomaly DetectionDefect Detection | —Unverified | 0 |
| Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Feb 27, 2025 | Handwritten Text RecognitionHTR | CodeCode Available | 0 |
| MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts | Feb 24, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding | Feb 20, 2025 | document understandingOptical Character Recognition | —Unverified | 0 |
| Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models | Feb 18, 2025 | Image to textOptical Character Recognition | CodeCode Available | 0 |