| Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation | Jun 16, 2020 | document understandingMachine Translation | —Unverified | 0 |
| Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models | Jun 25, 2025 | document understandingHallucination | —Unverified | 0 |
| Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding | May 16, 2023 | Decoderdocument understanding | —Unverified | 0 |
| Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Feb 24, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation | Nov 29, 2021 | Contrastive Learningdocument understanding | —Unverified | 0 |
| SLJP: Semantic Extraction based Legal Judgment Prediction | Dec 13, 2023 | document understandingPrediction | —Unverified | 0 |
| StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training | Nov 25, 2024 | document understandingLanguage Modeling | —Unverified | 0 |
| Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends | Jan 4, 2025 | document understandingQuestion Answering | —Unverified | 0 |
| SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding | Aug 27, 2024 | document understanding | —Unverified | 0 |
| Table-Of-Contents generation on contemporary documents | Nov 20, 2019 | document understanding | —Unverified | 0 |
| Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks | Jan 8, 2020 | document understandingOptical Character Recognition | —Unverified | 0 |
| Test-Time Adaptation for Visual Document Understanding | Jun 15, 2022 | document understandingDomain Adaptation | —Unverified | 0 |
| The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting | May 19, 2025 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues | Jul 16, 2021 | Attributedocument understanding | —Unverified | 0 |
| The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts | Aug 31, 2024 | document understandingtoken-classification | —Unverified | 0 |
| TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection | Nov 5, 2024 | document understanding | —Unverified | 0 |
| Towards Complex Document Understanding By Discrete Reasoning | Jul 25, 2022 | document understandingQuestion Answering | —Unverified | 0 |
| Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach | Apr 13, 2024 | document understanding | —Unverified | 0 |
| Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark | Jan 1, 2025 | document understandingImage Retrieval | —Unverified | 0 |
| GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification | Sep 11, 2023 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| Transformer-based Approach for Document Understanding | Oct 16, 2022 | DecoderDocument Layout Analysis | —Unverified | 0 |
| Two to Five Truths in Non-Negative Matrix Factorization | May 6, 2023 | Clusteringdocument understanding | —Unverified | 0 |
| Understanding Long Documents with Different Position-Aware Attentions | Aug 17, 2022 | document understandingPosition | —Unverified | 0 |
| UniDoc: Unified Pretraining Framework for Document Understanding | Dec 1, 2021 | document understandingSelf-Supervised Learning | —Unverified | 0 |
| Unified Pretraining Framework for Document Understanding | Apr 22, 2022 | Document Layout Analysisdocument understanding | —Unverified | 0 |
| Unimodal and Multimodal Representation Training for Relation Extraction | Nov 11, 2022 | document understandingRelation | —Unverified | 0 |
| ViRED: Prediction of Visual Relations in Engineering Drawings | Sep 2, 2024 | Decoderdocument understanding | —Unverified | 0 |
| WebFormer: The Web-page Transformer for Structure Information Extraction | Feb 1, 2022 | Deep Attentiondocument understanding | —Unverified | 0 |
| "What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs | Oct 20, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| What Makes a Good Dataset for Symbol Description Reading? | Apr 17, 2023 | document understandingMath | —Unverified | 0 |
| WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts | Jun 18, 2025 | document understandingMultiple-choice | —Unverified | 0 |
| Workshop on Document Intelligence Understanding | Jul 31, 2023 | document understandingVisual Question Answering (VQA) | —Unverified | 0 |
| XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding | May 1, 2022 | document understandingForm | —Unverified | 0 |
| Deep Learning based Visually Rich Document Content Understanding: A Survey | Aug 2, 2024 | Deep Learningdocument understanding | —Unverified | 0 |
| DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images | Jun 26, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Table Detection for Visually Rich Document Images | May 30, 2023 | document understandingobject-detection | CodeCode Available | 0 |
| Class-Agnostic Region-of-Interest Matching in Document Images | Jun 26, 2025 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | Apr 18, 2021 | Document Image Classificationdocument understanding | CodeCode Available | 0 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | Dec 29, 2020 | Document Image ClassificationDocument Layout Analysis | CodeCode Available | 0 |
| Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding | Dec 19, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| DocMIA: Document-Level Membership Inference Attacks against DocVQA Models | Feb 6, 2025 | document understandingInference Attack | CodeCode Available | 0 |
| Understood in Translation, Transformers for Domain Understanding | Dec 18, 2020 | document understandingTranslation | CodeCode Available | 0 |
| LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Apr 8, 2024 | Document AIdocument understanding | CodeCode Available | 0 |
| Knowing Where and What: Unified Word Block Pretraining for Document Understanding | Jul 28, 2022 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding | Oct 8, 2022 | document understandingKnowledge Graphs | CodeCode Available | 0 |
| Is ChatGPT A Good Keyphrase Generator? A Preliminary Study | Mar 23, 2023 | Diversitydocument understanding | CodeCode Available | 0 |
| Information Redundancy and Biases in Public Document Information Extraction Benchmarks | Apr 28, 2023 | document understandingKey Information Extraction | CodeCode Available | 0 |
| Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models | Jun 5, 2023 | document understandingQuestion Answering | CodeCode Available | 0 |
| Long-Range Transformer Architectures for Document Understanding | Sep 11, 2023 | document understandingInformation Retrieval | CodeCode Available | 0 |