RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages Dec 14, 2024 Machine Translation Optical Character Recognition
Code Code Available 0Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma Dec 14, 2024 GPU License Plate Recognition
— Unverified 0Enhancement of text recognition for hanja handwritten documents of Ancient Korea Dec 14, 2024 Data Augmentation object-detection
— Unverified 0One Filter to Deploy Them All: Robust Safety for Quadrupedal Navigation in Unknown Environments Dec 13, 2024 All Optical Character Recognition (OCR)
— Unverified 0AI Adoption to Combat Financial Crime: Study on Natural Language Processing in Adverse Media Screening of Financial Services in English and Bangla multilingual interpretation Dec 12, 2024 Optical Character Recognition (OCR)
— Unverified 0DocVLM: Make Your VLM an Efficient Reader Dec 11, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization Dec 11, 2024 Abstractive Text Summarization Decision Making
— Unverified 0TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action Dec 7, 2024 Depth Estimation Mathematical Reasoning
Code Code Available 2Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models Dec 6, 2024 Hallucination Optical Character Recognition (OCR)
— Unverified 0Aligned Music Notation and Lyrics Transcription Dec 5, 2024 Language Modeling Language Modelling
Code Code Available 0Text Change Detection in Multilingual Documents Using Image Comparison Dec 5, 2024 Binarization Change Detection
— Unverified 0SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction Dec 5, 2024 Articles Dataset Generation
Code Code Available 0Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Dec 5, 2024 Contrastive Learning Hallucination
Code Code Available 3PaliGemma 2: A Family of Versatile VLMs for Transfer Dec 4, 2024 Language Modeling Language Modelling
Code Code Available 3CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy Dec 3, 2024 Hallucination Key Information Extraction
— Unverified 0OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Dec 3, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 2Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection Dec 2, 2024 Binarization Optical Character Recognition (OCR)
— Unverified 0TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition Dec 2, 2024 Image Generation Optical Character Recognition (OCR)
Code Code Available 2DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness Nov 29, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models Nov 28, 2024 Language Modeling Language Modelling
— Unverified 0Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Nov 28, 2024 Attribute Hallucination
— Unverified 0SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition Nov 24, 2024 Decoder Optical Character Recognition (OCR)
Code Code Available 0Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction Nov 19, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 2Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting Nov 18, 2024 Diagnostic Optical Character Recognition
— Unverified 0Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Nov 16, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
Code Code Available 1DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives Nov 14, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Nov 12, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0Veri-Car: Towards Open-world Vehicle Information Retrieval Nov 11, 2024 Information Retrieval License Plate Detection
— Unverified 0Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding Nov 8, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Nov 8, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
— Unverified 0M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Nov 7, 2024 document understanding Optical Character Recognition
— Unverified 0TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models Nov 7, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning Nov 5, 2024 Continual Learning Imitation Learning
— Unverified 0HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction Nov 2, 2024 Image Reconstruction Optical Character Recognition (OCR)
— Unverified 0Handwriting Recognition in Historical Documents with Multimodal LLM Oct 31, 2024 Handwriting Recognition Optical Character Recognition
— Unverified 0Toxicity of the Commons: Curating Open-Source Pre-Training Data Oct 29, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 0Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers Oct 29, 2024 Cryptanalysis Optical Character Recognition (OCR)
— Unverified 0MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding Oct 25, 2024 Benchmarking document understanding
— Unverified 0Towards Visual Text Design Transfer Across Languages Oct 24, 2024 Image Generation Optical Character Recognition (OCR)
— Unverified 0Harnessing Webpage UIs for Text-Rich Visual Understanding Oct 17, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0Reference-Based Post-OCR Processing with LLM for Diacritic Languages Oct 17, 2024 Optical Character Recognition (OCR)
— Unverified 0LEGAL-UQA: A Low-Resource Urdu-English Dataset for Legal Question Answering Oct 16, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation Oct 15, 2024 License Plate Recognition Optical Character Recognition
— Unverified 0Enhancing Assamese NLP Capabilities: Introducing a Centralized Dataset Repository Oct 15, 2024 Diversity Machine Translation
Code Code Available 0ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training Oct 14, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0TextMaster: Universal Controllable Text Edit Oct 13, 2024 Optical Character Recognition (OCR) Style Transfer
— Unverified 0Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition Oct 13, 2024 Domain Adaptation Optical Character Recognition (OCR)
Code Code Available 1MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions Oct 13, 2024 Handwriting Recognition Optical Character Recognition
— Unverified 0Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster Oct 12, 2024 Genre classification Marketing
— Unverified 0