| Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding | Feb 28, 2024 | document understandingInformation Retrieval | CodeCode Available | 1 | 5 |
| DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Aug 27, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 | 5 |
| DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents | Jul 12, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 1 | 5 |
| FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding | Apr 24, 2025 | document understandingMME | CodeCode Available | 1 | 5 |
| Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs | Nov 22, 2023 | document understandingInstruction Following | CodeCode Available | 1 | 5 |
| Value Retrieval with Arbitrary Queries for Form-like Documents | Dec 15, 2021 | document understandingForm | CodeCode Available | 1 | 5 |
| Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding | Sep 29, 2024 | document understandingEntity Linking | CodeCode Available | 1 | 5 |
| On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling | Jan 25, 2024 | DecoderDiversity | CodeCode Available | 1 | 5 |
| DocFormerv2: Local Features for Document Understanding | Jun 2, 2023 | Decoderdocument understanding | CodeCode Available | 1 | 5 |
| ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding | Oct 12, 2022 | document-image-classificationDocument Image Classification | CodeCode Available | 1 | 5 |
| SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement | Jun 16, 2025 | document understandingQuestion Answering | CodeCode Available | 1 | 5 |
| Enhancing Visually-Rich Document Understanding via Layout Structure Modeling | Aug 15, 2023 | document understanding | CodeCode Available | 1 | 5 |
| Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks | Aug 23, 2022 | Document Layout Analysisdocument understanding | CodeCode Available | 1 | 5 |
| PaLI-X: On Scaling up a Multilingual Vision and Language Model | May 29, 2023 | Chart Question Answeringdocument understanding | CodeCode Available | 1 | 5 |
| End-to-end Document Recognition and Understanding with Dessurt | Mar 30, 2022 | document understandingVisual Question Answering (VQA) | CodeCode Available | 1 | 5 |
| On Web-based Visual Corpus Construction for Visual Document Understanding | Nov 7, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 | 5 |
| A Discrete Variational Recurrent Topic Model without the Reparametrization Trick | Oct 22, 2020 | document understandingVariational Inference | CodeCode Available | 1 | 5 |
| DocFormer: End-to-End Transformer for Document Understanding | Jun 22, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 1 | 5 |
| VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | Jul 17, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Multimodal Tree Decoder for Table of Contents Extraction in Document Images | Dec 6, 2022 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | Jun 17, 2024 | document understanding | CodeCode Available | 0 | 5 |
| Multimodal weighted graph representation for information extraction from visually rich documents. | Jan 5, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |
| Deeper Clinical Document Understanding Using Relation Extraction | Dec 25, 2021 | document understandingnamed-entity-recognition | CodeCode Available | 0 | 5 |
| Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | May 21, 2024 | document-image-classificationDocument Image Classification | CodeCode Available | 0 | 5 |
| Message Passing Attention Networks for Document Understanding | Aug 17, 2019 | document understandingMulti-Modal Document Classification | CodeCode Available | 0 | 5 |
| Data-driven Coreference-based Ontology Building | Oct 22, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 | 5 |
| Matching Article Pairs with Graphical Decomposition and Convolutions | Feb 21, 2018 | Articlesdocument understanding | CodeCode Available | 0 | 5 |
| M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? | Mar 27, 2025 | Document Summarizationdocument understanding | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | Oct 16, 2021 | document understanding | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding | May 1, 2022 | document understanding | CodeCode Available | 0 | 5 |
| 3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding | Feb 28, 2024 | document understandingForm | CodeCode Available | 0 | 5 |
| M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | May 15, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 0 | 5 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| Class-Agnostic Region-of-Interest Matching in Document Images | Jun 26, 2025 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| ChuLo: Chunk-Level Key Information Representation for Long Document Processing | Oct 14, 2024 | ChunkingClassification | CodeCode Available | 0 | 5 |
| Chargrid: Towards Understanding 2D Documents | Sep 24, 2018 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | Apr 18, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 0 | 5 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 | 5 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 | 5 |
| LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Apr 8, 2024 | Document AIdocument understanding | CodeCode Available | 0 | 5 |
| Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models | Jun 5, 2023 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| Is ChatGPT A Good Keyphrase Generator? A Preliminary Study | Mar 23, 2023 | Diversitydocument understanding | CodeCode Available | 0 | 5 |
| Information Redundancy and Biases in Public Document Information Extraction Benchmarks | Apr 28, 2023 | document understandingKey Information Extraction | CodeCode Available | 0 | 5 |
| KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding | Oct 8, 2022 | document understandingKnowledge Graphs | CodeCode Available | 0 | 5 |
| Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing | Jun 1, 2025 | Document AIdocument understanding | CodeCode Available | 0 | 5 |
| Improving Clinical Document Understanding on COVID-19 Research with Spark NLP | Dec 7, 2020 | AnatomyClinical Assertion Status Detection | CodeCode Available | 0 | 5 |
| Machine Unlearning for Document Classification | Apr 29, 2024 | ClassificationDocument Classification | CodeCode Available | 0 | 5 |
| Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network | Sep 11, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |