MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning May 26, 2025 document understanding Machine Translation
— Unverified 0TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis May 25, 2025 CPU GPU
— Unverified 0Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation May 25, 2025 Anomaly Detection Homography Estimation
— Unverified 0TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis May 23, 2025 Optical Character Recognition (OCR) Text Generation
— Unverified 0One RL to See Them All: Visual Triple Unified Reinforcement Learning May 23, 2025 All Math
— Unverified 0TokBench: Evaluating Your Visual Tokenizer before Visual Generation May 23, 2025 Face Recognition Face Reconstruction
— Unverified 0What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse May 22, 2025 Optical Character Recognition (OCR) Stance Detection
— Unverified 0OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning May 22, 2025 Optical Character Recognition (OCR) Visual Reasoning
Code Code Available 0How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads May 21, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR May 20, 2025 Articles Image Super-Resolution
— Unverified 0Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents May 19, 2025 Dataset Generation Optical Character Recognition (OCR)
— Unverified 0The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting May 19, 2025 document understanding Optical Character Recognition (OCR)
— Unverified 0Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline May 16, 2025 Abstractive Text Summarization Language Modeling
Code Code Available 0Object-Centric Representations Improve Policy Generalization in Robot Manipulation May 16, 2025 Optical Character Recognition (OCR) Robot Manipulation
— Unverified 0Analyzing Patterns and Influence of Advertising in Print Newspapers May 16, 2025 Articles Optical Character Recognition (OCR)
— Unverified 0An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents May 16, 2025 Form Language Modeling
Code Code Available 0Towards Self-Improvement of Diffusion Models via Group Preference Optimization May 16, 2025 Optical Character Recognition (OCR)
— Unverified 0PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language May 15, 2025 Benchmarking Optical Character Recognition
Code Code Available 0A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court May 13, 2025 Diversity Document Layout Analysis
— Unverified 0Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction May 12, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Gameplay Highlights Generation May 12, 2025 Event Detection Highlight Detection
— Unverified 0Development of a WAZOBIA-Named Entity Recognition System May 10, 2025 Machine Translation named-entity-recognition
— Unverified 0Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding May 9, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Toward Advancing License Plate Super-Resolution in Real-World Scenarios: A Dataset and Benchmark May 9, 2025 License Plate Recognition Optical Character Recognition
Code Code Available 0ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints May 8, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing May 8, 2025 Optical Character Recognition (OCR) Scene Text Editing
— Unverified 0Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval May 8, 2025 Computational Efficiency Optical Character Recognition
— Unverified 0DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation May 7, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration May 5, 2025 Optical Character Recognition (OCR) RAG
— Unverified 0Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer May 2, 2025 document understanding Hallucination
— Unverified 0Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis Apr 30, 2025 Optical Character Recognition (OCR)
— Unverified 0Guidelines for External Disturbance Factors in the Use of OCR in Real-World Environments Apr 21, 2025 Optical Character Recognition (OCR)
— Unverified 0Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Apr 21, 2025 Boundary Detection Optical Character Recognition (OCR)
— Unverified 0Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models Apr 16, 2025 document understanding Layout Design
Code Code Available 0Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR Apr 15, 2025 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Relation-Rich Visual Document Generator for Visual Information Extraction Apr 14, 2025 Diversity document understanding
Code Code Available 0NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding Apr 12, 2025 Benchmarking Document AI
— Unverified 0Towards Calibration Enhanced Network by Inverse Adversarial Attack Apr 8, 2025 Adversarial Attack Optical Character Recognition
— Unverified 0Towards Visual Text Grounding of Multimodal Large Language Model Apr 7, 2025 Benchmarking Language Modeling
— Unverified 0VISTA-OCR: Towards generative and interactive end to end OCR models Apr 4, 2025 Decoder Optical Character Recognition (OCR)
— Unverified 0QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding Apr 3, 2025 document understanding Language Modeling
— Unverified 0Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity Mar 31, 2025 Image Captioning Optical Character Recognition
— Unverified 0BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction Mar 25, 2025 document understanding object-detection
Code Code Available 0TFIC: End-to-End Text-Focused Image Compression for Coding for Machines Mar 25, 2025 Image Compression Optical Character Recognition
— Unverified 0Slide2Text: Leveraging LLMs for Personalized Textbook Generation from PowerPoint Presentations Mar 22, 2025 Optical Character Recognition (OCR)
— Unverified 0KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications Mar 21, 2025 16k 4k
Code Code Available 0A Data-driven Investigation of Euphemistic Language: Comparing the usage of "slave" and "servant" in 19th century US newspapers Mar 19, 2025 Optical Character Recognition (OCR)
Code Code Available 0LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Mar 13, 2025 Computational Efficiency Optical Character Recognition (OCR)
— Unverified 0KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents Mar 11, 2025 Optical Character Recognition (OCR) Retrieval
Code Code Available 0Revisiting Noise in Natural Language Processing for Computational Social Science Mar 10, 2025 Optical Character Recognition (OCR)
— Unverified 0