document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 309 papers

Title	Date	Tasks	Status
Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation	Jun 16, 2020	document understandingMachine Translation	—Unverified
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models	Jun 25, 2025	document understandingHallucination	—Unverified
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding	May 16, 2023	Decoderdocument understanding	—Unverified
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Feb 24, 2025	document understandingMultimodal Reasoning	—Unverified
SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation	Nov 29, 2021	Contrastive Learningdocument understanding	—Unverified
SLJP: Semantic Extraction based Legal Judgment Prediction	Dec 13, 2023	document understandingPrediction	—Unverified
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Nov 25, 2024	document understandingLanguage Modeling	—Unverified
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends	Jan 4, 2025	document understandingQuestion Answering	—Unverified
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding	Aug 27, 2024	document understanding	—Unverified
Table-Of-Contents generation on contemporary documents	Nov 20, 2019	document understanding	—Unverified
Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks	Jan 8, 2020	document understandingOptical Character Recognition	—Unverified
Test-Time Adaptation for Visual Document Understanding	Jun 15, 2022	document understandingDomain Adaptation	—Unverified
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting	May 19, 2025	document understandingOptical Character Recognition (OCR)	—Unverified
The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues	Jul 16, 2021	Attributedocument understanding	—Unverified
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts	Aug 31, 2024	document understandingtoken-classification	—Unverified
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection	Nov 5, 2024	document understanding	—Unverified
Towards Complex Document Understanding By Discrete Reasoning	Jul 25, 2022	document understandingQuestion Answering	—Unverified
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach	Apr 13, 2024	document understanding	—Unverified
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark	Jan 1, 2025	document understandingImage Retrieval	—Unverified
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification	Sep 11, 2023	document-image-classificationDocument Image Classification	—Unverified
Transformer-based Approach for Document Understanding	Oct 16, 2022	DecoderDocument Layout Analysis	—Unverified
Two to Five Truths in Non-Negative Matrix Factorization	May 6, 2023	Clusteringdocument understanding	—Unverified
Understanding Long Documents with Different Position-Aware Attentions	Aug 17, 2022	document understandingPosition	—Unverified
UniDoc: Unified Pretraining Framework for Document Understanding	Dec 1, 2021	document understandingSelf-Supervised Learning	—Unverified
Unified Pretraining Framework for Document Understanding	Apr 22, 2022	Document Layout Analysisdocument understanding	—Unverified
Unimodal and Multimodal Representation Training for Relation Extraction	Nov 11, 2022	document understandingRelation	—Unverified
ViRED: Prediction of Visual Relations in Engineering Drawings	Sep 2, 2024	Decoderdocument understanding	—Unverified
WebFormer: The Web-page Transformer for Structure Information Extraction	Feb 1, 2022	Deep Attentiondocument understanding	—Unverified
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs	Oct 20, 2024	document understandingKey Information Extraction	—Unverified
What Makes a Good Dataset for Symbol Description Reading?	Apr 17, 2023	document understandingMath	—Unverified
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Jun 18, 2025	document understandingMultiple-choice	—Unverified
Workshop on Document Intelligence Understanding	Jul 31, 2023	document understandingVisual Question Answering (VQA)	—Unverified
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding	May 1, 2022	document understandingForm	—Unverified
Deep Learning based Visually Rich Document Content Understanding: A Survey	Aug 2, 2024	Deep Learningdocument understanding	—Unverified
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images	Jun 26, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available
Table Detection for Visually Rich Document Images	May 30, 2023	document understandingobject-detection	CodeCode Available
Class-Agnostic Region-of-Interest Matching in Document Images	Jun 26, 2025	Document Layout Analysisdocument understanding	CodeCode Available
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding	Apr 18, 2021	Document Image Classificationdocument understanding	CodeCode Available
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	Dec 29, 2020	Document Image ClassificationDocument Layout Analysis	CodeCode Available
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding	Dec 19, 2022	Contrastive Learningdocument understanding	CodeCode Available
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models	Feb 6, 2025	document understandingInference Attack	CodeCode Available
Understood in Translation, Transformers for Domain Understanding	Dec 18, 2020	document understandingTranslation	CodeCode Available
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding	Apr 8, 2024	Document AIdocument understanding	CodeCode Available
Knowing Where and What: Unified Word Block Pretraining for Document Understanding	Jul 28, 2022	Contrastive Learningdocument understanding	CodeCode Available
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding	Oct 8, 2022	document understandingKnowledge Graphs	CodeCode Available
Is ChatGPT A Good Keyphrase Generator? A Preliminary Study	Mar 23, 2023	Diversitydocument understanding	CodeCode Available
Information Redundancy and Biases in Public Document Information Extraction Benchmarks	Apr 28, 2023	document understandingKey Information Extraction	CodeCode Available
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models	Jun 5, 2023	document understandingQuestion Answering	CodeCode Available
Long-Range Transformer Architectures for Document Understanding	Sep 11, 2023	document understandingInformation Retrieval	CodeCode Available

Show:10 25 50

← PrevPage 5 of 7Next →

No leaderboard results yet.