| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| Efficient layout-aware pretraining for multimodal form understanding | Jan 16, 2022 | document understandingForm | —Unverified | 0 |
| Enhancing Question Answering on Charts Through Effective Pre-training Tasks | Jun 14, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models | Feb 29, 2024 | Contrastive Learningdocument understanding | —Unverified | 0 |
| Enumeration of Extractive Oracle Summaries | Jan 6, 2017 | document understandingExtractive Summarization | —Unverified | 0 |
| ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding | Sep 18, 2022 | Common Sense Reasoningdocument understanding | —Unverified | 0 |
| Extract with Order for Coherent Multi-Document Summarization | Jun 12, 2017 | Document Summarizationdocument understanding | —Unverified | 0 |
| Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding | May 19, 2023 | document understanding | —Unverified | 0 |
| Finding Pragmatic Differences Between Disciplines | Sep 30, 2023 | DiversityDocument Summarization | —Unverified | 0 |
| FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction | Mar 16, 2022 | Document AIdocument understanding | —Unverified | 0 |
| Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Sep 17, 2024 | document understandingTransfer Learning | —Unverified | 0 |
| Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation | Jul 1, 2021 | document understandingWord Embeddings | —Unverified | 0 |
| Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications | Sep 27, 2024 | DiversityDocument Summarization | —Unverified | 0 |
| LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents | Jan 26, 2024 | 4kDocument AI | —Unverified | 0 |
| LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model | Jan 16, 2022 | document understanding | —Unverified | 0 |
| LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | Nov 2, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 |
| MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary | Jul 24, 2023 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications | Nov 28, 2024 | document understandingMathematical Reasoning | —Unverified | 0 |
| MATrIX -- Modality-Aware Transformer for Information eXtraction | May 17, 2022 | document understanding | —Unverified | 0 |
| Memory-Augmented Agent Training for Business Document Understanding | Dec 17, 2024 | document understanding | —Unverified | 0 |
| Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents | Dec 1, 2020 | document understandingLanguage Modeling | —Unverified | 0 |
| M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework | Nov 9, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 |
| MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning | May 26, 2025 | document understandingMachine Translation | —Unverified | 0 |
| Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web | Jul 1, 2020 | document understandingEntity Linking | —Unverified | 0 |
| NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition | Jul 16, 2024 | Decoderdocument understanding | —Unverified | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| Notes on Applicability of GPT-4 to Document Understanding | May 28, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Object-oriented Neural Programming (OONP) for Document Understanding | Sep 26, 2017 | document understandingObject | —Unverified | 0 |
| One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text | Sep 12, 2022 | document understandingobject-detection | —Unverified | 0 |
| On Scaling Up a Multilingual Vision and Language Model | Jan 1, 2024 | document understandingIn-Context Learning | —Unverified | 0 |
| OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis | Oct 1, 2021 | Active Learningdocument understanding | —Unverified | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| Position Masking for Improved Layout-Aware Document Understanding | Sep 1, 2021 | document understandingPosition | —Unverified | 0 |
| Probing Position-Aware Attention Mechanism in Long Document Understanding | Nov 16, 2021 | document understandingNatural Language Understanding | —Unverified | 0 |
| ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks | Oct 3, 2023 | document understandingIncremental Learning | —Unverified | 0 |
| PSG: Prompt-based Sequence Generation for Acronym Extraction | Nov 29, 2021 | document understandingLanguage Modeling | —Unverified | 0 |
| QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding | Apr 3, 2025 | document understandingLanguage Modeling | —Unverified | 0 |
| QueryForm: A Simple Zero-shot Form Entity Query Framework | Nov 14, 2022 | document understandingForm | —Unverified | 0 |
| RDU: A Region-based Approach to Form-style Document Understanding | Jun 14, 2022 | document understandingForm | —Unverified | 0 |
| Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API | Oct 7, 2023 | Decoderdocument understanding | —Unverified | 0 |
| ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training | Oct 14, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | May 30, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Revisiting Table Detection Datasets for Visually Rich Documents | May 4, 2023 | document understandingobject-detection | —Unverified | 0 |
| RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning | Feb 19, 2024 | document understandingMedical Diagnosis | —Unverified | 0 |
| Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods | Mar 23, 2022 | document understandingLine Detection | —Unverified | 0 |