| ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding | Oct 12, 2022 | document-image-classificationDocument Image Classification | CodeCode Available | 1 |
| ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark | May 22, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 1 |
| On Web-based Visual Corpus Construction for Visual Document Understanding | Nov 7, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding | Sep 29, 2024 | document understandingEntity Linking | CodeCode Available | 1 |
| M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | Jan 1, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 1 |
| Multimodal Pre-training Based on Graph Attention Network for Document Understanding | Mar 25, 2022 | document understandingGraph Attention | CodeCode Available | 1 |
| DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Aug 27, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation | May 3, 2023 | Data Visualizationdocument understanding | CodeCode Available | 1 |
| LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating | Dec 24, 2024 | document understandingQuestion Answering | CodeCode Available | 1 |
| Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Jan 26, 2025 | document understandingLanguage Modeling | CodeCode Available | 1 |
| LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World | Jun 1, 2025 | document understandingEntity Linking | CodeCode Available | 1 |
| A Discrete Variational Recurrent Topic Model without the Reparametrization Trick | Oct 22, 2020 | document understandingVariational Inference | CodeCode Available | 1 |
| Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Jun 4, 2024 | document understandingGPU | CodeCode Available | 1 |
| Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | Feb 18, 2021 | DecoderDocument Image Classification | CodeCode Available | 1 |
| Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks | Aug 23, 2022 | Document Layout Analysisdocument understanding | CodeCode Available | 1 |
| Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding | Feb 28, 2024 | document understandingInformation Retrieval | CodeCode Available | 1 |
| FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding | Apr 24, 2025 | document understandingMME | CodeCode Available | 1 |
| DocFormer: End-to-End Transformer for Document Understanding | Jun 22, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 1 |
| DocFormerv2: Local Features for Document Understanding | Jun 2, 2023 | Decoderdocument understanding | CodeCode Available | 1 |
| DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | Jun 5, 2025 | document understandingEvent Detection | —Unverified | 0 |
| BERT-AL: BERT for Arbitrarily Long Document Understanding | Jan 1, 2020 | document understandingText Summarization | —Unverified | 0 |
| Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | Jul 23, 2024 | Deep Learningdocument understanding | —Unverified | 0 |
| DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc | Aug 17, 2022 | document understandingForm | —Unverified | 0 |
| AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content | May 24, 2023 | Document Summarizationdocument understanding | —Unverified | 0 |
| A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications | Jun 22, 2019 | document understandingNatural Language Understanding | —Unverified | 0 |
| Automatic Knowledge Extraction with Human Interface | Apr 9, 2021 | document understanding | —Unverified | 0 |
| Decontextualization: Making Sentences Stand-Alone | Feb 9, 2021 | document understandingQuestion Answering | —Unverified | 0 |
| Arctic-TILT. Business Document Understanding at Sub-Billion Scale | Aug 8, 2024 | document understandingGPU | —Unverified | 0 |
| DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights | Oct 2, 2024 | document understandingDomain Adaptation | —Unverified | 0 |
| Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer | May 2, 2025 | document understandingHallucination | —Unverified | 0 |
| Finding Pragmatic Differences Between Disciplines | Sep 30, 2023 | DiversityDocument Summarization | —Unverified | 0 |
| Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding) | Jul 1, 2016 | document understanding | —Unverified | 0 |
| A User-Centered Concept Mining System for Query and Document Understanding at Tencent | May 21, 2019 | document understandingKnowledge Base Construction | —Unverified | 0 |
| CREPE: Coordinate-Aware End-to-End Document Parser | May 1, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? | May 16, 2025 | document understanding | —Unverified | 0 |
| ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information | Nov 29, 2022 | document understandingRetrieval | —Unverified | 0 |
| Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration | Sep 3, 2023 | Decoderdocument understanding | —Unverified | 0 |
| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 |
| Attention-Based Graph Neural Network with Global Context Awareness for Document Understanding | Oct 1, 2020 | document understandinggraph construction | —Unverified | 0 |
| Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding | Dec 22, 2020 | document understanding | —Unverified | 0 |
| Extract with Order for Coherent Multi-Document Summarization | Jun 12, 2017 | Document Summarizationdocument understanding | —Unverified | 0 |
| Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding | May 19, 2023 | document understanding | —Unverified | 0 |
| FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction | Mar 16, 2022 | Document AIdocument understanding | —Unverified | 0 |
| A Multi-Modal Multilingual Benchmark for Document Image Classification | Oct 25, 2023 | ClassificationCross-Lingual Transfer | —Unverified | 0 |
| Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models | Feb 29, 2024 | Contrastive Learningdocument understanding | —Unverified | 0 |
| DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency | Nov 9, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| DOGE: Towards Versatile Visual Document Grounding and Referring | Nov 26, 2024 | document understanding | —Unverified | 0 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| A Token-level Text Image Foundation Model for Document Understanding | Mar 4, 2025 | document understandingVisual Question Answering (VQA) | —Unverified | 0 |