| M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | Jan 1, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 1 |
| Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding | Dec 19, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| Multimodal Tree Decoder for Table of Contents Extraction in Document Images | Dec 6, 2022 | Decoderdocument understanding | CodeCode Available | 0 |
| Unifying Vision, Text, and Layout for Universal Document Processing | Dec 5, 2022 | Document AIdocument understanding | CodeCode Available | 3 |
| ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information | Nov 29, 2022 | document understandingRetrieval | —Unverified | 0 |
| VRDU: A Benchmark for Visually-rich Document Understanding | Nov 15, 2022 | document understanding | —Unverified | 0 |
| QueryForm: A Simple Zero-shot Form Entity Query Framework | Nov 14, 2022 | document understandingForm | —Unverified | 0 |
| Unimodal and Multimodal Representation Training for Relation Extraction | Nov 11, 2022 | document understandingRelation | —Unverified | 0 |
| On Web-based Visual Corpus Construction for Visual Document Understanding | Nov 7, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Transformer-based Approach for Document Understanding | Oct 16, 2022 | DecoderDocument Layout Analysis | —Unverified | 0 |
| ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding | Oct 12, 2022 | document-image-classificationDocument Image Classification | CodeCode Available | 1 |
| KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding | Oct 8, 2022 | document understandingKnowledge Graphs | CodeCode Available | 0 |
| XDoc: Unified Pre-training for Cross-Format Document Understanding | Oct 6, 2022 | document understandingSemantic entity labeling | CodeCode Available | 0 |
| DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents | Oct 1, 2022 | document understandingForm | CodeCode Available | 1 |
| ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding | Sep 18, 2022 | Common Sense Reasoningdocument understanding | —Unverified | 0 |
| One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text | Sep 12, 2022 | document understandingobject-detection | —Unverified | 0 |
| Improving Keyphrase Extraction with Data Augmentation and Information Filtering | Sep 11, 2022 | Data Augmentationdocument understanding | —Unverified | 0 |
| Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks | Aug 23, 2022 | Document Layout Analysisdocument understanding | CodeCode Available | 1 |
| DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc | Aug 17, 2022 | document understandingForm | —Unverified | 0 |
| Understanding Long Documents with Different Position-Aware Attentions | Aug 17, 2022 | document understandingPosition | —Unverified | 0 |
| Knowing Where and What: Unified Word Block Pretraining for Document Understanding | Jul 28, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 |
| DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding | Jul 14, 2022 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding | Jun 27, 2022 | Document Classificationdocument understanding | —Unverified | 0 |
| Test-Time Adaptation for Visual Document Understanding | Jun 15, 2022 | document understandingDomain Adaptation | —Unverified | 0 |
| RDU: A Region-based Approach to Form-style Document Understanding | Jun 14, 2022 | document understandingForm | —Unverified | 0 |
| Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models) | Jun 1, 2022 | document understandingQuestion Generation | —Unverified | 0 |
| Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness | Jun 1, 2022 | CPUdocument understanding | CodeCode Available | 2 |
| MATrIX -- Modality-Aware Transformer for Information eXtraction | May 17, 2022 | document understanding | —Unverified | 0 |
| MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding | May 1, 2022 | document understanding | CodeCode Available | 0 |
| XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding | May 1, 2022 | document understandingForm | —Unverified | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | —Unverified | 0 |
| Unified Pretraining Framework for Document Understanding | Apr 22, 2022 | Document Layout Analysisdocument understanding | —Unverified | 0 |
| End-to-end Document Recognition and Understanding with Dessurt | Mar 30, 2022 | document understandingVisual Question Answering (VQA) | CodeCode Available | 1 |
| Multimodal Pre-training Based on Graph Attention Network for Document Understanding | Mar 25, 2022 | document understandingGraph Attention | CodeCode Available | 1 |
| Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods | Mar 23, 2022 | document understandingLine Detection | —Unverified | 0 |
| FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction | Mar 16, 2022 | Document AIdocument understanding | —Unverified | 0 |
| XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding | Mar 14, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Hierarchical BERT for Medical Document Understanding | Mar 11, 2022 | document understandingSentence | —Unverified | 0 |
| LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding | Feb 28, 2022 | Document Image Classificationdocument understanding | CodeCode Available | 2 |
| WebFormer: The Web-page Transformer for Structure Information Extraction | Feb 1, 2022 | Deep Attentiondocument understanding | —Unverified | 0 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 |
| LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model | Jan 16, 2022 | document understanding | —Unverified | 0 |
| Efficient layout-aware pretraining for multimodal form understanding | Jan 16, 2022 | document understandingForm | —Unverified | 0 |
| Deeper Clinical Document Understanding Using Relation Extraction | Dec 25, 2021 | document understandingnamed-entity-recognition | CodeCode Available | 0 |
| Value Retrieval with Arbitrary Queries for Form-like Documents | Dec 15, 2021 | document understandingForm | CodeCode Available | 1 |
| UniDoc: Unified Pretraining Framework for Document Understanding | Dec 1, 2021 | document understandingSelf-Supervised Learning | —Unverified | 0 |
| OCR-free Document Understanding Transformer | Nov 30, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 3 |
| SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation | Nov 29, 2021 | Contrastive Learningdocument understanding | —Unverified | 0 |
| PSG: Prompt-based Sequence Generation for Acronym Extraction | Nov 29, 2021 | document understandingLanguage Modeling | —Unverified | 0 |