| Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models | May 24, 2023 | document understandingImage Captioning | CodeCode Available | 1 | 5 |
| ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark | May 22, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 1 | 5 |
| DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents | Jul 12, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 1 | 5 |
| SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement | Jun 16, 2025 | document understandingQuestion Answering | CodeCode Available | 1 | 5 |
| Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding | Feb 28, 2024 | document understandingInformation Retrieval | CodeCode Available | 1 | 5 |
| FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding | Apr 24, 2025 | document understandingMME | CodeCode Available | 1 | 5 |
| Privacy-Aware Document Visual Question Answering | Dec 15, 2023 | document understandingFederated Learning | CodeCode Available | 1 | 5 |
| Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs | Nov 22, 2023 | document understandingInstruction Following | CodeCode Available | 1 | 5 |
| DocFormerv2: Local Features for Document Understanding | Jun 2, 2023 | Decoderdocument understanding | CodeCode Available | 1 | 5 |
| ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding | Oct 12, 2022 | document-image-classificationDocument Image Classification | CodeCode Available | 1 | 5 |
| PaLI-X: On Scaling up a Multilingual Vision and Language Model | May 29, 2023 | Chart Question Answeringdocument understanding | CodeCode Available | 1 | 5 |
| DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Jan 1, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 | 5 |
| Enhancing Visually-Rich Document Understanding via Layout Structure Modeling | Aug 15, 2023 | document understanding | CodeCode Available | 1 | 5 |
| Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks | Aug 23, 2022 | Document Layout Analysisdocument understanding | CodeCode Available | 1 | 5 |
| Ocean-OCR: Towards General OCR Application via a Vision-Language Model | Jan 26, 2025 | document understandingLanguage Modeling | CodeCode Available | 1 | 5 |
| End-to-end Document Recognition and Understanding with Dessurt | Mar 30, 2022 | document understandingVisual Question Answering (VQA) | CodeCode Available | 1 | 5 |
| A Discrete Variational Recurrent Topic Model without the Reparametrization Trick | Oct 22, 2020 | document understandingVariational Inference | CodeCode Available | 1 | 5 |
| DocFormer: End-to-End Transformer for Document Understanding | Jun 22, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 1 | 5 |
| Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution | Jan 24, 2021 | 3D Feature Matchingdocument understanding | CodeCode Available | 1 | 5 |
| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | CodeCode Available | 0 | 5 |
| Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | May 21, 2024 | document-image-classificationDocument Image Classification | CodeCode Available | 0 | 5 |
| mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Mar 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| Deeper Clinical Document Understanding Using Relation Extraction | Dec 25, 2021 | document understandingnamed-entity-recognition | CodeCode Available | 0 | 5 |
| Message Passing Attention Networks for Document Understanding | Aug 17, 2019 | document understandingMulti-Modal Document Classification | CodeCode Available | 0 | 5 |
| DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding | Jul 14, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? | Mar 27, 2025 | Document Summarizationdocument understanding | CodeCode Available | 0 | 5 |
| Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | Jun 17, 2024 | document understanding | CodeCode Available | 0 | 5 |
| Data-driven Coreference-based Ontology Building | Oct 22, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding | May 1, 2022 | document understanding | CodeCode Available | 0 | 5 |
| Matching Article Pairs with Graphical Decomposition and Convolutions | Feb 21, 2018 | Articlesdocument understanding | CodeCode Available | 0 | 5 |
| M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | May 15, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 0 | 5 |
| Machine Unlearning for Document Classification | Apr 29, 2024 | ClassificationDocument Classification | CodeCode Available | 0 | 5 |
| Long-Range Transformer Architectures for Document Understanding | Sep 11, 2023 | document understandingInformation Retrieval | CodeCode Available | 0 | 5 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| Class-Agnostic Region-of-Interest Matching in Document Images | Jun 26, 2025 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |
| 3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding | Feb 28, 2024 | document understandingForm | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | Oct 16, 2021 | document understanding | CodeCode Available | 0 | 5 |
| Multimodal Tree Decoder for Table of Contents Extraction in Document Images | Dec 6, 2022 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models | Jun 5, 2023 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| ChuLo: Chunk-Level Key Information Representation for Long Document Processing | Oct 14, 2024 | ChunkingClassification | CodeCode Available | 0 | 5 |
| Chargrid: Towards Understanding 2D Documents | Sep 24, 2018 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond | Oct 19, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 | 5 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 | 5 |
| LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Apr 8, 2024 | Document AIdocument understanding | CodeCode Available | 0 | 5 |
| LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | Apr 18, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 0 | 5 |
| Knowing Where and What: Unified Word Block Pretraining for Document Understanding | Jul 28, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 | 5 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 | 5 |
| Information Redundancy and Biases in Public Document Information Extraction Benchmarks | Apr 28, 2023 | document understandingKey Information Extraction | CodeCode Available | 0 | 5 |