DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment Jul 17, 2025 Document Image Quality Assessment Image Quality Assessment
Code Code Available 0VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Jul 17, 2025 Language Modeling Language Modelling
Code Code Available 0Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis Jul 15, 2025 Marketing Optical Character Recognition
— Unverified 0A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends Jul 14, 2025 document understanding Optical Character Recognition
— Unverified 0Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning Jul 9, 2025 Benchmarking Image Retrieval
Code Code Available 0Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices Jul 9, 2025 Boundary Detection Optical Character Recognition (OCR)
— Unverified 0TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision Jul 8, 2025 Image Generation Optical Character Recognition (OCR)
— Unverified 0PaddleOCR 3.0 Technical Report Jul 8, 2025 document understanding Key Information Extraction
Code Code Available 0Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration Jul 7, 2025 Optical Character Recognition (OCR)
Code Code Available 2Logios : An open source Greek Polytonic Optical Character Recognition system Jun 26, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 0Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation Jun 25, 2025 Optical Character Recognition (OCR) RAG
— Unverified 0Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models Jun 25, 2025 document understanding Hallucination
— Unverified 0Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages Jun 22, 2025 image-classification Image Classification
— Unverified 0An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW Jun 18, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0FormGym: Doing Paperwork with Agents Jun 17, 2025 Form Information Retrieval
— Unverified 0AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding Jun 16, 2025 Optical Character Recognition (OCR) RAG
Code Code Available 0Efficient Medical VIE via Reinforcement Learning Jun 16, 2025 Diversity Optical Character Recognition (OCR)
— Unverified 0MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Jun 16, 2025 Optical Character Recognition (OCR)
— Unverified 0Intelligent Automation for FDI Facilitation: Optimizing Tariff Exemption Processes with OCR And Large Language Models Jun 12, 2025 Large Language Model Optical Character Recognition
— Unverified 0Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Jun 12, 2025 Hallucination Optical Character Recognition (OCR)
— Unverified 0Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Jun 10, 2025 Optical Character Recognition (OCR)
Code Code Available 2Reading in the Dark with Foveated Event Vision Jun 7, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing Jun 7, 2025 Meta-Learning Model Editing
— Unverified 0A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions Jun 5, 2025 Computational Efficiency document understanding
— Unverified 0MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories Jun 5, 2025 Benchmarking Optical Character Recognition
Code Code Available 2Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing Jun 1, 2025 Document AI document understanding
Code Code Available 0SARD: A Large-Scale Synthetic Arabic OCR Dataset for Book-Style Text Recognition May 30, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Predicting the Past: Estimating Historical Appraisals with OCR and Machine Learning May 30, 2025 Optical Character Recognition (OCR)
Code Code Available 0Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition May 29, 2025 Handwritten Mathmatical Expression Recognition Language Modeling
Code Code Available 1ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering May 29, 2025 Chart Question Answering Chart Understanding
— Unverified 0TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance May 29, 2025 Image Super-Resolution Optical Character Recognition
— Unverified 0Synthetic Document Question Answering in Hungarian May 29, 2025 Optical Character Recognition (OCR) Question Answering
Code Code Available 0VidText: Towards Comprehensive Evaluation for Video Text Understanding May 28, 2025 Multimodal Reasoning Optical Character Recognition (OCR)
Code Code Available 1ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge May 28, 2025 Imitation Learning Math
Code Code Available 1E2E Process Automation Leveraging Generative AI and IDP-Based Automation Agent: A Case Study on Corporate Expense Processing May 27, 2025 Decision Making Optical Character Recognition (OCR)
— Unverified 0Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging May 26, 2025 Language Modeling Language Modelling
Code Code Available 1On Path to Multimodal Historical Reasoning: HistBench and HistAgent May 26, 2025 Optical Character Recognition (OCR)
Code Code Available 4MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning May 26, 2025 document understanding Machine Translation
— Unverified 0TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis May 25, 2025 CPU GPU
— Unverified 0ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models May 25, 2025 Optical Character Recognition (OCR) Reading Comprehension
Code Code Available 1Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation May 25, 2025 Anomaly Detection Homography Estimation
— Unverified 0TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis May 23, 2025 Optical Character Recognition (OCR) Text Generation
— Unverified 0One RL to See Them All: Visual Triple Unified Reinforcement Learning May 23, 2025 All Math
— Unverified 0TokBench: Evaluating Your Visual Tokenizer before Visual Generation May 23, 2025 Face Recognition Face Reconstruction
— Unverified 0OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning May 22, 2025 Optical Character Recognition (OCR) Visual Reasoning
Code Code Available 0ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark May 22, 2025 document understanding Multimodal Reasoning
Code Code Available 1What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse May 22, 2025 Optical Character Recognition (OCR) Stance Detection
— Unverified 0How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads May 21, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR May 20, 2025 Articles Image Super-Resolution
— Unverified 0