| Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding | May 19, 2023 | document understanding | —Unverified | 0 | 0 |
| Finding Pragmatic Differences Between Disciplines | Sep 30, 2023 | DiversityDocument Summarization | —Unverified | 0 | 0 |
| FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction | Mar 16, 2022 | Document AIdocument understanding | —Unverified | 0 | 0 |
| FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction | May 4, 2023 | Contrastive Learningdocument understanding | —Unverified | 0 | 0 |
| Friendly Topic Assistant for Transformer Based Abstractive Summarization | Nov 1, 2020 | Abstractive Text SummarizationDocument Summarization | —Unverified | 0 | 0 |
| From Entity Linking to Question Answering -- Recent Progress on Semantic Grounding Tasks | Dec 1, 2016 | document understandingEntity Linking | —Unverified | 0 | 0 |
| Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models) | Jun 1, 2022 | document understandingQuestion Generation | —Unverified | 0 | 0 |
| Graph Convolution for Multimodal Information Extraction from Visually Rich Documents | Mar 27, 2019 | document understandingEntity Extraction using GAN | —Unverified | 0 | 0 |
| Handling tree-structured text: parsing directory pages | Nov 24, 2021 | document understanding | —Unverified | 0 | 0 |
| Harnessing Webpage UIs for Text-Rich Visual Understanding | Oct 17, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Hierarchical BERT for Medical Document Understanding | Mar 11, 2022 | document understandingSentence | —Unverified | 0 | 0 |
| Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models | Sep 18, 2020 | DecoderDialogue Generation | —Unverified | 0 | 0 |
| Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding | Nov 8, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| How does Watermarking Affect Visual Language Models in Document Understanding? | Apr 1, 2025 | document understanding | —Unverified | 0 | 0 |
| HRVDA: High-Resolution Visual Document Assistant | Apr 10, 2024 | document understanding | —Unverified | 0 | 0 |
| Improving Applicability of Deep Learning based Token Classification models during Training | Mar 28, 2025 | document understandingtoken-classification | —Unverified | 0 | 0 |
| Improving Keyphrase Extraction with Data Augmentation and Information Filtering | Sep 11, 2022 | Data Augmentationdocument understanding | —Unverified | 0 | 0 |
| Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation | Nov 22, 2024 | Anomaly Detectiondocument understanding | —Unverified | 0 | 0 |
| Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding | Nov 12, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Joint Structured Learning and Predictions under Logical Constraints in Conditional Random Fields | Aug 25, 2017 | BIG-bench Machine Learningdocument understanding | —Unverified | 0 | 0 |
| KeyVec: Key-semantics Preserving Document Representations | Sep 27, 2017 | BIG-bench Machine Learningdocument understanding | —Unverified | 0 | 0 |
| KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding | Feb 20, 2025 | document understandingOptical Character Recognition | —Unverified | 0 | 0 |
| KOSMOS-2.5: A Multimodal Literate Model | Sep 20, 2023 | document understandingmodel | —Unverified | 0 | 0 |
| LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding | Apr 16, 2021 | document understanding | —Unverified | 0 | 0 |
| LAPDoc: Layout-Aware Prompting for Documents | Feb 15, 2024 | document understandingKey Information Extraction | —Unverified | 0 | 0 |
| LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding | Mar 21, 2024 | document-image-classificationDocument Image Classification | —Unverified | 0 | 0 |
| LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding | May 30, 2023 | document-image-classificationDocument Image Classification | —Unverified | 0 | 0 |
| Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Sep 17, 2024 | document understandingTransfer Learning | —Unverified | 0 | 0 |
| Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation | Jul 1, 2021 | document understandingWord Embeddings | —Unverified | 0 | 0 |
| Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications | Sep 27, 2024 | DiversityDocument Summarization | —Unverified | 0 | 0 |
| LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents | Jan 26, 2024 | 4kDocument AI | —Unverified | 0 | 0 |
| LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model | Jan 16, 2022 | document understanding | —Unverified | 0 | 0 |
| LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | Nov 2, 2024 | document understandingQuestion Answering | —Unverified | 0 | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 | 0 |
| MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary | Jul 24, 2023 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications | Nov 28, 2024 | document understandingMathematical Reasoning | —Unverified | 0 | 0 |
| MATrIX -- Modality-Aware Transformer for Information eXtraction | May 17, 2022 | document understanding | —Unverified | 0 | 0 |
| Memory-Augmented Agent Training for Business Document Understanding | Dec 17, 2024 | document understanding | —Unverified | 0 | 0 |
| Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents | Dec 1, 2020 | document understandingLanguage Modeling | —Unverified | 0 | 0 |
| M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework | Nov 9, 2024 | document understandingQuestion Answering | —Unverified | 0 | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 | 0 |
| mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Mar 19, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | —Unverified | 0 | 0 |
| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | —Unverified | 0 | 0 |
| MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning | May 26, 2025 | document understandingMachine Translation | —Unverified | 0 | 0 |
| Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web | Jul 1, 2020 | document understandingEntity Linking | —Unverified | 0 | 0 |
| NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition | Jul 16, 2024 | Decoderdocument understanding | —Unverified | 0 | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 | 0 |
| Notes on Applicability of GPT-4 to Document Understanding | May 28, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Object-oriented Neural Programming (OONP) for Document Understanding | Sep 26, 2017 | document understandingObject | —Unverified | 0 | 0 |