| OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Feb 22, 2025 | document understandingKey Information Extraction | CodeCode Available | 0 | 5 |
| OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition | Jan 1, 2024 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| Multimodal weighted graph representation for information extraction from visually rich documents. | Jan 5, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |
| Multimodal Tree Decoder for Table of Contents Extraction in Document Images | Dec 6, 2022 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 | 5 |
| Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | Jun 17, 2024 | document understanding | CodeCode Available | 0 | 5 |
| Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | May 21, 2024 | document-image-classificationDocument Image Classification | CodeCode Available | 0 | 5 |
| EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation | Dec 9, 2020 | document understandingInformation Retrieval | CodeCode Available | 0 | 5 |
| Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models | Apr 16, 2025 | document understandingLayout Design | CodeCode Available | 0 | 5 |
| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | CodeCode Available | 0 | 5 |
| OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition | Mar 28, 2024 | Decoderdocument understanding | CodeCode Available | 0 | 5 |
| PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding | Jun 22, 2025 | document understanding | CodeCode Available | 0 | 5 |
| DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding | Jul 14, 2022 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| DocMIA: Document-Level Membership Inference Attacks against DocVQA Models | Feb 6, 2025 | document understandingInference Attack | CodeCode Available | 0 | 5 |
| Message Passing Attention Networks for Document Understanding | Aug 17, 2019 | document understandingMulti-Modal Document Classification | CodeCode Available | 0 | 5 |
| A Survey of Deep Learning Approaches for OCR and Document Understanding | Nov 27, 2020 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| Blockwise Self-Attention for Long Document Understanding | Nov 7, 2019 | document understandingLanguage Modeling | CodeCode Available | 0 | 5 |
| Is ChatGPT A Good Keyphrase Generator? A Preliminary Study | Mar 23, 2023 | Diversitydocument understanding | CodeCode Available | 0 | 5 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding | May 1, 2022 | document understanding | CodeCode Available | 0 | 5 |
| Matching Article Pairs with Graphical Decomposition and Convolutions | Feb 21, 2018 | Articlesdocument understanding | CodeCode Available | 0 | 5 |
| M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? | Mar 27, 2025 | Document Summarizationdocument understanding | CodeCode Available | 0 | 5 |
| mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Mar 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 | 5 |
| M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | May 15, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 0 | 5 |
| Improving Clinical Document Understanding on COVID-19 Research with Spark NLP | Dec 7, 2020 | AnatomyClinical Assertion Status Detection | CodeCode Available | 0 | 5 |
| Long-Range Transformer Architectures for Document Understanding | Sep 11, 2023 | document understandingInformation Retrieval | CodeCode Available | 0 | 5 |
| Hypergraph based Understanding for Document Semantic Entity Recognition | Jul 9, 2024 | document understanding | CodeCode Available | 0 | 5 |
| Machine Unlearning for Document Classification | Apr 29, 2024 | ClassificationDocument Classification | CodeCode Available | 0 | 5 |
| Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding | Aug 16, 2019 | Abstractive Text Summarizationdocument understanding | CodeCode Available | 0 | 5 |
| LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | Apr 18, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 0 | 5 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 | 5 |
| HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja | Jan 21, 2025 | document understandingMachine Translation | CodeCode Available | 0 | 5 |
| BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction | Mar 25, 2025 | document understandingobject-detection | CodeCode Available | 0 | 5 |
| LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Apr 8, 2024 | Document AIdocument understanding | CodeCode Available | 0 | 5 |
| KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding | Oct 8, 2022 | document understandingKnowledge Graphs | CodeCode Available | 0 | 5 |
| Knowing Where and What: Unified Word Block Pretraining for Document Understanding | Jul 28, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 | 5 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 | 5 |
| MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | Oct 16, 2021 | document understanding | CodeCode Available | 0 | 5 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | CodeCode Available | 0 | 5 |
| 3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding | Feb 28, 2024 | document understandingForm | CodeCode Available | 0 | 5 |
| Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing | Jun 1, 2025 | Document AIdocument understanding | CodeCode Available | 0 | 5 |
| PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks | Mar 6, 2025 | document understandingLanguage Modeling | CodeCode Available | 0 | 5 |
| Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network | Sep 11, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 | 5 |
| DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | Jun 5, 2025 | document understandingEvent Detection | —Unverified | 0 | 0 |
| Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models) | Jun 1, 2022 | document understandingQuestion Generation | —Unverified | 0 | 0 |
| BERT-AL: BERT for Arbitrarily Long Document Understanding | Jan 1, 2020 | document understandingText Summarization | —Unverified | 0 | 0 |
| From Entity Linking to Question Answering -- Recent Progress on Semantic Grounding Tasks | Dec 1, 2016 | document understandingEntity Linking | —Unverified | 0 | 0 |
| Friendly Topic Assistant for Transformer Based Abstractive Summarization | Nov 1, 2020 | Abstractive Text SummarizationDocument Summarization | —Unverified | 0 | 0 |
| Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | Jul 23, 2024 | Deep Learningdocument understanding | —Unverified | 0 | 0 |