CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model Mar 9, 2025 Hallucination Language Modeling
— Unverified 0PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks Mar 6, 2025 document understanding Language Modeling
— Unverified 0AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates Mar 5, 2025 Anomaly Detection Defect Detection
— Unverified 0Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription Feb 27, 2025 Handwritten Text Recognition HTR
Code Code Available 0Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models Feb 25, 2025 Optical Character Recognition (OCR)
Code Code Available 0NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts Feb 25, 2025 Image Segmentation Language Identification
— Unverified 0MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts Feb 24, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Feb 24, 2025 document understanding Multimodal Reasoning
— Unverified 0Visual Zero-Shot E-Commerce Product Attribute Value Extraction Feb 21, 2025 Aspect Extraction Attribute
— Unverified 0KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Feb 20, 2025 document understanding Optical Character Recognition
— Unverified 0Harnessing PDF Data for Improving Japanese Large Multimodal Models Feb 20, 2025 Optical Character Recognition (OCR)
— Unverified 0Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models Feb 18, 2025 Image to text Optical Character Recognition
Code Code Available 0Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning Feb 18, 2025 Optical Character Recognition (OCR)
— Unverified 0Southern Newswire Corpus: A Large-Scale Dataset of Mid-Century Wire Articles Beyond the Front Page Feb 17, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Feb 13, 2025 Benchmarking Math
— Unverified 0Adapting Multilingual Embedding Models to Historical Luxembourgish Feb 11, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents Feb 6, 2025 Image Captioning Optical Character Recognition
— Unverified 0MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark Jan 28, 2025 MME Model Optimization
— Unverified 0Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records Jan 20, 2025 HTR Optical Character Recognition (OCR)
Code Code Available 0Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images Jan 16, 2025 De-identification Optical Character Recognition
— Unverified 0MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Jan 15, 2025 Benchmarking Optical Character Recognition (OCR)
— Unverified 0Jochre 3 and the Yiddish OCR corpus Jan 14, 2025 Optical Character Recognition (OCR)
Code Code Available 0Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway Jan 13, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Jan 9, 2025 Language Modeling Language Modelling
Code Code Available 0Efficient License Plate Recognition in Videos Using Visual Rhythm and Accumulative Line Analysis Jan 8, 2025 License Plate Detection License Plate Recognition
Code Code Available 0SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild Jan 6, 2025 Attribute Optical Character Recognition
— Unverified 0BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Jan 6, 2025 Document AI document understanding
— Unverified 0Efficient Video-Based ALPR System Using YOLO and Visual Rhythm Jan 4, 2025 License Plate Recognition Optical Character Recognition
Code Code Available 0Emergency-Brake Simplex: Toward A Verifiably Safe Control-CPS Architecture for Abrupt Runtime Reachability Constraint Changes Jan 3, 2025 Computational Efficiency Optical Character Recognition (OCR)
— Unverified 0Crossing Language Borders: A Pipeline for Indonesian Manhwa Translation Jan 3, 2025 Machine Translation Object Detection
Code Code Available 0Embedding Similarity Guided License Plate Super Resolution Jan 2, 2025 License Plate Recognition Optical Character Recognition
— Unverified 0CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR Jan 1, 2025 All Optical Character Recognition
— Unverified 0Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark Jan 1, 2025 document understanding Image Retrieval
— Unverified 0Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study Dec 29, 2024 Motion Detection Optical Character Recognition
Code Code Available 0Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions Dec 29, 2024 Data Augmentation Image Segmentation
— Unverified 0VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data Dec 24, 2024 Computational Efficiency Optical Character Recognition
— Unverified 0HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images Dec 24, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0ERPA: Efficient RPA Model Integrating OCR and LLMs for Intelligent Document Processing Dec 24, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0LMV-RPA: Large Model Voting-based Robotic Process Automation Dec 23, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts Dec 20, 2024 Benchmarking Optical Character Recognition
Code Code Available 0InstructOCR: Instruction Boosting Scene Text Spotting Dec 20, 2024 Optical Character Recognition (OCR) Text Spotting
Code Code Available 0TextSleuth: Towards Explainable Tampered Text Detection Dec 19, 2024 Domain Generalization Optical Character Recognition (OCR)
— Unverified 0Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues Dec 17, 2024 Language Modeling Language Modelling
Code Code Available 0DoPTA: Improving Document Layout Analysis using Patch-Text Alignment Dec 17, 2024 Document AI Document Image Classification
— Unverified 0Advanced ingestion process powered by LLM parsing for RAG system Dec 16, 2024 Optical Character Recognition (OCR) RAG
— Unverified 0RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages Dec 14, 2024 Machine Translation Optical Character Recognition
Code Code Available 0Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma Dec 14, 2024 GPU License Plate Recognition
— Unverified 0Enhancement of text recognition for hanja handwritten documents of Ancient Korea Dec 14, 2024 Data Augmentation object-detection
— Unverified 0One Filter to Deploy Them All: Robust Safety for Quadrupedal Navigation in Unknown Environments Dec 13, 2024 All Optical Character Recognition (OCR)
— Unverified 0AI Adoption to Combat Financial Crime: Study on Natural Language Processing in Adverse Media Screening of Financial Services in English and Bangla multilingual interpretation Dec 12, 2024 Optical Character Recognition (OCR)
— Unverified 0