| Deeper Clinical Document Understanding Using Relation Extraction | Dec 25, 2021 | document understandingnamed-entity-recognition | CodeCode Available | 0 |
| DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering | May 1, 2022 | document understandingOpen-Domain Question Answering | CodeCode Available | 0 |
| Relation-Rich Visual Document Generator for Visual Information Extraction | Apr 14, 2025 | Diversitydocument understanding | CodeCode Available | 0 |
| 3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding | Feb 28, 2024 | document understandingForm | CodeCode Available | 0 |
| M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | May 15, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 0 |
| Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network | Sep 11, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| Machine Unlearning for Document Classification | Apr 29, 2024 | ClassificationDocument Classification | CodeCode Available | 0 |
| MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | Oct 16, 2021 | document understanding | CodeCode Available | 0 |
| MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding | May 1, 2022 | document understanding | CodeCode Available | 0 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 |
| XDoc: Unified Pre-training for Cross-Format Document Understanding | Oct 6, 2022 | document understandingSemantic entity labeling | CodeCode Available | 0 |
| Zero-Shot Complex Question-Answering on Long Scientific Documents | Mar 4, 2025 | Answer Generationdocument understanding | CodeCode Available | 0 |
| Matching Article Pairs with Graphical Decomposition and Convolutions | Feb 21, 2018 | Articlesdocument understanding | CodeCode Available | 0 |
| ChuLo: Chunk-Level Key Information Representation for Long Document Processing | Oct 14, 2024 | ChunkingClassification | CodeCode Available | 0 |
| Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing | Jun 1, 2025 | Document AIdocument understanding | CodeCode Available | 0 |
| M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? | Mar 27, 2025 | Document Summarizationdocument understanding | CodeCode Available | 0 |
| Improving Clinical Document Understanding on COVID-19 Research with Spark NLP | Dec 7, 2020 | AnatomyClinical Assertion Status Detection | CodeCode Available | 0 |
| Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Jul 19, 2024 | document understandingInformativeness | CodeCode Available | 0 |
| BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction | Mar 25, 2025 | document understandingobject-detection | CodeCode Available | 0 |
| Message Passing Attention Networks for Document Understanding | Aug 17, 2019 | document understandingMulti-Modal Document Classification | CodeCode Available | 0 |
| Chargrid: Towards Understanding 2D Documents | Sep 24, 2018 | Decoderdocument understanding | CodeCode Available | 0 |
| SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap | Sep 21, 2023 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| Hypergraph based Understanding for Document Semantic Entity Recognition | Jul 9, 2024 | document understanding | CodeCode Available | 0 |
| HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja | Jan 21, 2025 | document understandingMachine Translation | CodeCode Available | 0 |
| mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Mar 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | CodeCode Available | 0 |
| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | CodeCode Available | 0 |
| A Survey of Deep Learning Approaches for OCR and Document Understanding | Nov 27, 2020 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | May 21, 2024 | document-image-classificationDocument Image Classification | CodeCode Available | 0 |
| DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding | Jul 14, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | May 6, 2024 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | Jun 17, 2024 | document understanding | CodeCode Available | 0 |
| Multimodal Tree Decoder for Table of Contents Extraction in Document Images | Dec 6, 2022 | Decoderdocument understanding | CodeCode Available | 0 |
| Multimodal weighted graph representation for information extraction from visually rich documents. | Jan 5, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |
| SFDLA: Source-Free Document Layout Analysis | Mar 24, 2025 | AvgDocument Layout Analysis | CodeCode Available | 0 |
| Blockwise Self-Attention for Long Document Understanding | Nov 7, 2019 | document understandingLanguage Modeling | CodeCode Available | 0 |
| Data-driven Coreference-based Ontology Building | Oct 22, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond | Oct 19, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Financial Report Chunking for Effective Retrieval Augmented Generation | Feb 5, 2024 | Chunkingdocument understanding | CodeCode Available | 0 |
| OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition | Mar 28, 2024 | Decoderdocument understanding | CodeCode Available | 0 |
| OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition | Jan 1, 2024 | Decoderdocument understanding | CodeCode Available | 0 |
| OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Feb 22, 2025 | document understandingKey Information Extraction | CodeCode Available | 0 |
| KRED: Knowledge-Aware Document Representation for News Recommendations | Oct 25, 2019 | Articlesdocument understanding | CodeCode Available | 0 |
| Skim-Attention: Learning to Focus via Document Layout | Sep 2, 2021 | document understandingLanguage Modeling | CodeCode Available | 0 |
| Vision Grid Transformer for Document Layout Analysis | Aug 29, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | CodeCode Available | 0 |
| Long Context Compression with Activation Beacon | Jan 7, 2024 | 4kdocument understanding | CodeCode Available | 0 |
| Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models | Apr 16, 2025 | document understandingLayout Design | CodeCode Available | 0 |
| PaddleOCR 3.0 Technical Report | Jul 8, 2025 | document understandingKey Information Extraction | CodeCode Available | 0 |