Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription Feb 27, 2025 Handwritten Text Recognition HTR
Code Code Available 0NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts Feb 25, 2025 Image Segmentation Language Identification
— Unverified 0Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models Feb 25, 2025 Optical Character Recognition (OCR)
Code Code Available 0Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Feb 24, 2025 document understanding Multimodal Reasoning
— Unverified 0MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts Feb 24, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Visual Zero-Shot E-Commerce Product Attribute Value Extraction Feb 21, 2025 Aspect Extraction Attribute
— Unverified 0KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Feb 20, 2025 document understanding Optical Character Recognition
— Unverified 0Harnessing PDF Data for Improving Japanese Large Multimodal Models Feb 20, 2025 Optical Character Recognition (OCR)
— Unverified 0Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning Feb 18, 2025 Optical Character Recognition (OCR)
— Unverified 0Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models Feb 18, 2025 Image to text Optical Character Recognition
Code Code Available 0Southern Newswire Corpus: A Large-Scale Dataset of Mid-Century Wire Articles Beyond the Front Page Feb 17, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Feb 13, 2025 Benchmarking Math
— Unverified 0Adapting Multilingual Embedding Models to Historical Luxembourgish Feb 11, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments Feb 10, 2025 Benchmarking Optical Character Recognition
Code Code Available 1Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents Feb 6, 2025 Image Captioning Optical Character Recognition
— Unverified 0Towards Making Flowchart Images Machine Interpretable Jan 29, 2025 Code Generation Optical Character Recognition (OCR)
Code Code Available 1MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark Jan 28, 2025 MME Model Optimization
— Unverified 0Ocean-OCR: Towards General OCR Application via a Vision-Language Model Jan 26, 2025 document understanding Language Modeling
Code Code Available 1Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records Jan 20, 2025 HTR Optical Character Recognition (OCR)
Code Code Available 0Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images Jan 16, 2025 De-identification Optical Character Recognition
— Unverified 0MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Jan 15, 2025 Benchmarking Optical Character Recognition (OCR)
— Unverified 0Jochre 3 and the Yiddish OCR corpus Jan 14, 2025 Optical Character Recognition (OCR)
Code Code Available 0MathReader : Text-to-Speech for Mathematical Documents Jan 13, 2025 Optical Character Recognition (OCR) text-to-speech
Code Code Available 1Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway Jan 13, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Jan 9, 2025 Language Modeling Language Modelling
Code Code Available 0Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line Analysis Jan 8, 2025 License Plate Detection License Plate Recognition
Code Code Available 0BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Jan 6, 2025 Document AI document understanding
— Unverified 0Geometry Restoration and Dewarping of Camera-Captured Document Images Jan 6, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild Jan 6, 2025 Attribute Optical Character Recognition
— Unverified 0Efficient Video-Based ALPR System Using YOLO and Visual Rhythm Jan 4, 2025 License Plate Recognition Optical Character Recognition
Code Code Available 0Emergency-Brake Simplex: Toward A Verifiably Safe Control-CPS Architecture for Abrupt Runtime Reachability Constraint Changes Jan 3, 2025 Computational Efficiency Optical Character Recognition (OCR)
— Unverified 0Crossing Language Borders: A Pipeline for Indonesian Manhwa Translation Jan 3, 2025 Machine Translation Object Detection
Code Code Available 0Embedding Similarity Guided License Plate Super Resolution Jan 2, 2025 License Plate Recognition Optical Character Recognition
— Unverified 0Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark Jan 1, 2025 document understanding Image Retrieval
— Unverified 0DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding Jan 1, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 1CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR Jan 1, 2025 All Optical Character Recognition
— Unverified 02.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Jan 1, 2025 Optical Character Recognition (OCR)
Code Code Available 2OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning Dec 31, 2024 Benchmarking Logical Reasoning
Code Code Available 4Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions Dec 29, 2024 Data Augmentation Image Segmentation
— Unverified 0Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study Dec 29, 2024 Motion Detection Optical Character Recognition
Code Code Available 0ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing Dec 24, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images Dec 24, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data Dec 24, 2024 Computational Efficiency Optical Character Recognition
— Unverified 0LMV-RPA: Large Model Voting-based Robotic Process Automation Dec 23, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0InstructOCR: Instruction Boosting Scene Text Spotting Dec 20, 2024 Optical Character Recognition (OCR) Text Spotting
Code Code Available 0Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts Dec 20, 2024 Benchmarking Optical Character Recognition
Code Code Available 0TextSleuth: Towards Explainable Tampered Text Detection Dec 19, 2024 Domain Generalization Optical Character Recognition (OCR)
— Unverified 0Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues Dec 17, 2024 Language Modeling Language Modelling
Code Code Available 0DoPTA: Improving Document Layout Analysis using Patch-Text Alignment Dec 17, 2024 Document AI Document Image Classification
— Unverified 0Advanced ingestion process powered by LLM parsing for RAG system Dec 16, 2024 Optical Character Recognition (OCR) RAG
— Unverified 0