DocVLM: Make Your VLM an Efficient Reader Dec 11, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization Dec 11, 2024 Abstractive Text Summarization Decision Making
— Unverified 0Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models Dec 6, 2024 Hallucination Optical Character Recognition (OCR)
— Unverified 0Text Change Detection in Multilingual Documents Using Image Comparison Dec 5, 2024 Binarization Change Detection
— Unverified 0Aligned Music Notation and Lyrics Transcription Dec 5, 2024 Language Modeling Language Modelling
Code Code Available 0SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction Dec 5, 2024 Articles Dataset Generation
Code Code Available 0CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy Dec 3, 2024 Hallucination Key Information Extraction
— Unverified 0Arabic Handwritten Document OCR Solution with Binarization and Adaptive Scale Fusion Detection Dec 2, 2024 Binarization Optical Character Recognition (OCR)
— Unverified 0DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness Nov 29, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models Nov 28, 2024 Language Modeling Language Modelling
— Unverified 0Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Nov 28, 2024 Attribute Hallucination
— Unverified 0SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition Nov 24, 2024 Decoder Optical Character Recognition (OCR)
— Unverified 0Towards Accessible Learning: Deep Learning-Based Potential Dysgraphia Detection and OCR for Potentially Dysgraphic Handwriting Nov 18, 2024 Diagnostic Optical Character Recognition
— Unverified 0DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives Nov 14, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Nov 12, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0Veri-Car: Towards Open-world Vehicle Information Retrieval Nov 11, 2024 Information Retrieval License Plate Detection
— Unverified 0NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Nov 8, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
— Unverified 0Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding Nov 8, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models Nov 7, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Nov 7, 2024 document understanding Optical Character Recognition
— Unverified 0Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning Nov 5, 2024 Continual Learning Imitation Learning
— Unverified 0HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction Nov 2, 2024 Image Reconstruction Optical Character Recognition (OCR)
— Unverified 0Handwriting Recognition in Historical Documents with Multimodal LLM Oct 31, 2024 Handwriting Recognition Optical Character Recognition
— Unverified 0Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 0Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers Oct 29, 2024 Cryptanalysis Optical Character Recognition (OCR)
— Unverified 0MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding Oct 25, 2024 Benchmarking document understanding
— Unverified 0Towards Visual Text Design Transfer Across Languages Oct 24, 2024 Image Generation Optical Character Recognition (OCR)
— Unverified 0Reference-Based Post-OCR Processing with LLM for Diacritic Languages Oct 17, 2024 Optical Character Recognition (OCR)
— Unverified 0Harnessing Webpage UIs for Text-Rich Visual Understanding Oct 17, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0LEGAL-UQA: A Low-Resource Urdu-English Dataset for Legal Question Answering Oct 16, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Enhancing Assamese NLP Capabilities: Introducing a Centralized Dataset Repository Oct 15, 2024 Diversity Machine Translation
Code Code Available 0Comparison of Image Preprocessing Techniques for Vehicle License Plate Recognition Using OCR: Performance and Accuracy Evaluation Oct 15, 2024 License Plate Recognition Optical Character Recognition
— Unverified 0ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training Oct 14, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0TextMaster: Universal Controllable Text Edit Oct 13, 2024 Optical Character Recognition (OCR) Style Transfer
— Unverified 0MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions Oct 13, 2024 Handwriting Recognition Optical Character Recognition
— Unverified 0Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster Oct 12, 2024 Genre classification Marketing
— Unverified 0Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR Oct 8, 2024 object-detection Object Detection
— Unverified 0Automated Quality Control System for Canned Tuna Production using Artificial Vision Oct 8, 2024 GPU Optical Character Recognition (OCR)
— Unverified 0Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends Oct 5, 2024 Benchmarking Chart Understanding
— Unverified 0Khattat: Enhancing Readability and Concept Representation of Semantic Typography Oct 1, 2024 Language Modeling Language Modelling
— Unverified 0World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Sep 30, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers Sep 30, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Sep 30, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
— Unverified 0Scrambled text: training Language Models to correct OCR errors using synthetic data Sep 29, 2024 Articles Language Modeling
Code Code Available 0See then Tell: Enhancing Key Information Extraction with Vision Grounding Sep 29, 2024 Image to text Key Information Extraction
— Unverified 0CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials Sep 27, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0JoyType: A Robust Design for Multilingual Visual Text Creation Sep 26, 2024 Image Generation Optical Character Recognition (OCR)
— Unverified 0MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features Sep 25, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents Sep 25, 2024 named-entity-recognition Named Entity Recognition
Code Code Available 0@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 0