| Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models | May 24, 2023 | document understandingImage Captioning | CodeCode Available | 1 |
| Document Understanding Dataset and Evaluation (DUDE) | May 15, 2023 | Document AIdocument understanding | CodeCode Available | 1 |
| LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation | May 3, 2023 | Data Visualizationdocument understanding | CodeCode Available | 1 |
| CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data | Apr 28, 2023 | document understandingLanguage Modeling | CodeCode Available | 1 |
| M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | Jan 1, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 1 |
| On Web-based Visual Corpus Construction for Visual Document Understanding | Nov 7, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding | Oct 12, 2022 | document-image-classificationDocument Image Classification | CodeCode Available | 1 |
| DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents | Oct 1, 2022 | document understandingForm | CodeCode Available | 1 |
| Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks | Aug 23, 2022 | Document Layout Analysisdocument understanding | CodeCode Available | 1 |
| End-to-end Document Recognition and Understanding with Dessurt | Mar 30, 2022 | document understandingVisual Question Answering (VQA) | CodeCode Available | 1 |
| Multimodal Pre-training Based on Graph Attention Network for Document Understanding | Mar 25, 2022 | document understandingGraph Attention | CodeCode Available | 1 |
| XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding | Mar 14, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Value Retrieval with Arbitrary Queries for Form-like Documents | Dec 15, 2021 | document understandingForm | CodeCode Available | 1 |
| DocFormer: End-to-End Transformer for Document Understanding | Jun 22, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 1 |
| CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding | May 23, 2021 | document understandingDomain Adaptation | CodeCode Available | 1 |
| Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | Feb 18, 2021 | DecoderDocument Image Classification | CodeCode Available | 1 |
| Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution | Jan 24, 2021 | 3D Feature Matchingdocument understanding | CodeCode Available | 1 |
| A Discrete Variational Recurrent Topic Model without the Reparametrization Trick | Oct 22, 2020 | document understandingVariational Inference | CodeCode Available | 1 |
| MedICaT: A Dataset of Medical Images, Captions, and Textual References | Oct 12, 2020 | document understandingImage-text matching | CodeCode Available | 1 |
| A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends | Jul 14, 2025 | document understandingOptical Character Recognition | —Unverified | 0 |
| PaddleOCR 3.0 Technical Report | Jul 8, 2025 | document understandingKey Information Extraction | CodeCode Available | 0 |
| Class-Agnostic Region-of-Interest Matching in Document Images | Jun 26, 2025 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models | Jun 25, 2025 | document understandingHallucination | —Unverified | 0 |
| PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding | Jun 22, 2025 | document understanding | CodeCode Available | 0 |
| WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts | Jun 18, 2025 | document understandingMultiple-choice | —Unverified | 0 |
| DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | Jun 5, 2025 | document understandingEvent Detection | —Unverified | 0 |
| A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions | Jun 5, 2025 | Computational Efficiencydocument understanding | —Unverified | 0 |
| Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing | Jun 1, 2025 | Document AIdocument understanding | CodeCode Available | 0 |
| MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning | May 26, 2025 | document understandingMachine Translation | —Unverified | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning | May 24, 2025 | document understandingVisual Reasoning | —Unverified | 0 |
| The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting | May 19, 2025 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? | May 16, 2025 | document understanding | —Unverified | 0 |
| Document Image Rectification Bases on Self-Adaptive Multitask Fusion | May 9, 2025 | document understanding | —Unverified | 0 |
| Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer | May 2, 2025 | document understandingHallucination | —Unverified | 0 |
| Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models | Apr 16, 2025 | document understandingLayout Design | CodeCode Available | 0 |
| Relation-Rich Visual Document Generator for Visual Information Extraction | Apr 14, 2025 | Diversitydocument understanding | CodeCode Available | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding | Apr 3, 2025 | document understandingLanguage Modeling | —Unverified | 0 |
| How does Watermarking Affect Visual Language Models in Document Understanding? | Apr 1, 2025 | document understanding | —Unverified | 0 |
| Improving Applicability of Deep Learning based Token Classification models during Training | Mar 28, 2025 | document understandingtoken-classification | —Unverified | 0 |
| M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? | Mar 27, 2025 | Document Summarizationdocument understanding | CodeCode Available | 0 |
| BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction | Mar 25, 2025 | document understandingobject-detection | CodeCode Available | 0 |
| SFDLA: Source-Free Document Layout Analysis | Mar 24, 2025 | AvgDocument Layout Analysis | CodeCode Available | 0 |
| A Simple yet Effective Layout Token in Large Language Models for Document Understanding | Mar 24, 2025 | document understandingPosition | —Unverified | 0 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 |
| PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks | Mar 6, 2025 | document understandingLanguage Modeling | CodeCode Available | 0 |
| A Token-level Text Image Foundation Model for Document Understanding | Mar 4, 2025 | document understandingVisual Question Answering (VQA) | —Unverified | 0 |
| Zero-Shot Complex Question-Answering on Long Scientific Documents | Mar 4, 2025 | Answer Generationdocument understanding | CodeCode Available | 0 |