Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? May 19, 2025 Logical Reasoning Optical Character Recognition
Code Code Available 1The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting May 19, 2025 document understanding Optical Character Recognition (OCR)
— Unverified 0Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents May 19, 2025 Dataset Generation Optical Character Recognition (OCR)
— Unverified 0LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? May 18, 2025 Logical Reasoning Multimodal Reasoning
Code Code Available 1Object-Centric Representations Improve Policy Generalization in Robot Manipulation May 16, 2025 Optical Character Recognition (OCR) Robot Manipulation
— Unverified 0An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents May 16, 2025 Form Language Modeling
Code Code Available 0Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline May 16, 2025 Abstractive Text Summarization Language Modeling
Code Code Available 0Analyzing Patterns and Influence of Advertising in Print Newspapers May 16, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0Towards Self-Improvement of Diffusion Models via Group Preference Optimization May 16, 2025 Optical Character Recognition (OCR)
— Unverified 0PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language May 15, 2025 Benchmarking Optical Character Recognition
Code Code Available 0A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court May 13, 2025 Diversity Document Layout Analysis
— Unverified 0Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction May 12, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Gameplay Highlights Generation May 12, 2025 Event Detection Highlight Detection
— Unverified 0Development of a WAZOBIA-Named Entity Recognition System May 10, 2025 Machine Translation named-entity-recognition
— Unverified 0Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding May 9, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark May 9, 2025 License Plate Recognition Optical Character Recognition
Code Code Available 0Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval May 8, 2025 Computational Efficiency Optical Character Recognition
— Unverified 0GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing May 8, 2025 Optical Character Recognition (OCR) Scene Text Editing
— Unverified 0ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints May 8, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation May 7, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration May 5, 2025 Optical Character Recognition (OCR) RAG
— Unverified 0Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer May 2, 2025 document understanding Hallucination
— Unverified 0Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis Apr 30, 2025 Optical Character Recognition (OCR)
— Unverified 0Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Apr 21, 2025 Boundary Detection Optical Character Recognition (OCR)
— Unverified 0Guidelines for External Disturbance Factors in the Use of OCR in Real-World Environments Apr 21, 2025 Optical Character Recognition (OCR)
— Unverified 0Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models Apr 16, 2025 document understanding Layout Design
Code Code Available 0Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR Apr 15, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Relation-Rich Visual Document Generator for Visual Information Extraction Apr 14, 2025 Diversity document understanding
Code Code Available 0NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding Apr 12, 2025 Benchmarking Document AI
— Unverified 0Kimi-VL Technical Report Apr 10, 2025 Long-Context Understanding Mathematical Reasoning
Code Code Available 5Towards Calibration Enhanced Network by Inverse Adversarial Attack Apr 8, 2025 Adversarial Attack Optical Character Recognition
— Unverified 0Towards Visual Text Grounding of Multimodal Large Language Model Apr 7, 2025 Benchmarking Language Modeling
— Unverified 0VISTA-OCR: Towards generative and interactive end to end OCR models Apr 4, 2025 Decoder Optical Character Recognition (OCR)
— Unverified 0QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding Apr 3, 2025 document understanding Language Modeling
— Unverified 0Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents Apr 1, 2025 named-entity-recognition Named Entity Recognition
Code Code Available 1Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity Mar 31, 2025 Image Captioning Optical Character Recognition
— Unverified 0From Panels to Prose: Generating Literary Narratives from Comics Mar 30, 2025 Optical Character Recognition (OCR)
Code Code Available 3BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction Mar 25, 2025 document understanding object-detection
Code Code Available 0TFIC: End-to-End Text-Focused Image Compression for Coding for Machines Mar 25, 2025 Image Compression Optical Character Recognition
— Unverified 0PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model Mar 24, 2025 Language Modeling Language Modelling
Code Code Available 1Slide2Text: Leveraging LLMs for Personalized Textbook Generation from PowerPoint Presentations Mar 22, 2025 Optical Character Recognition (OCR)
— Unverified 0KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications Mar 21, 2025 16k 4k
Code Code Available 0A Data-driven Investigation of Euphemistic Language: Comparing the usage of "slave" and "servant" in 19th century US newspapers Mar 19, 2025 Optical Character Recognition (OCR)
Code Code Available 0LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Mar 13, 2025 Computational Efficiency Optical Character Recognition (OCR)
— Unverified 0KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents Mar 11, 2025 Optical Character Recognition (OCR) Retrieval
Code Code Available 0Revisiting Noise in Natural Language Processing for Computational Social Science Mar 10, 2025 Optical Character Recognition (OCR)
— Unverified 0CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model Mar 9, 2025 Hallucination Language Modeling
— Unverified 0PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks Mar 6, 2025 document understanding Language Modeling
Code Code Available 0AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates Mar 5, 2025 Anomaly Detection Defect Detection
— Unverified 0An Approach for Air Drawing Using Background Subtraction and Contour Extraction Mar 3, 2025 Hand Detection Optical Character Recognition (OCR)
Code Code Available 2