document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 309 papers

Title	Date	Tasks	Status	Hype
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding	Oct 12, 2022	document-image-classificationDocument Image Classification	CodeCode Available	1
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	May 22, 2025	document understandingMultimodal Reasoning	CodeCode Available	1
On Web-based Visual Corpus Construction for Visual Document Understanding	Nov 7, 2022	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding	Sep 29, 2024	document understandingEntity Linking	CodeCode Available	1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis	Jan 1, 2023	ArticlesDocument Layout Analysis	CodeCode Available	1
Multimodal Pre-training Based on Graph Attention Network for Document Understanding	Mar 25, 2022	document understandingGraph Attention	CodeCode Available	1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Aug 27, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation	May 3, 2023	Data Visualizationdocument understanding	CodeCode Available	1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Dec 24, 2024	document understandingQuestion Answering	CodeCode Available	1
Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Jan 26, 2025	document understandingLanguage Modeling	CodeCode Available	1
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1
A Discrete Variational Recurrent Topic Model without the Reparametrization Trick	Oct 22, 2020	document understandingVariational Inference	CodeCode Available	1
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning	Jun 4, 2024	document understandingGPU	CodeCode Available	1
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer	Feb 18, 2021	DecoderDocument Image Classification	CodeCode Available	1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks	Aug 23, 2022	Document Layout Analysisdocument understanding	CodeCode Available	1
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding	Feb 28, 2024	document understandingInformation Retrieval	CodeCode Available	1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1
DocFormer: End-to-End Transformer for Document Understanding	Jun 22, 2021	Document Image Classificationdocument understanding	CodeCode Available	1
DocFormerv2: Local Features for Document Understanding	Jun 2, 2023	Decoderdocument understanding	CodeCode Available	1
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning	Jun 5, 2025	document understandingEvent Detection	—Unverified	0
BERT-AL: BERT for Arbitrarily Long Document Understanding	Jan 1, 2020	document understandingText Summarization	—Unverified	0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review	Jul 23, 2024	Deep Learningdocument understanding	—Unverified	0
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc	Aug 17, 2022	document understandingForm	—Unverified	0
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content	May 24, 2023	Document Summarizationdocument understanding	—Unverified	0
A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications	Jun 22, 2019	document understandingNatural Language Understanding	—Unverified	0
Automatic Knowledge Extraction with Human Interface	Apr 9, 2021	document understanding	—Unverified	0
Decontextualization: Making Sentences Stand-Alone	Feb 9, 2021	document understandingQuestion Answering	—Unverified	0
Arctic-TILT. Business Document Understanding at Sub-Billion Scale	Aug 8, 2024	document understandingGPU	—Unverified	0
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights	Oct 2, 2024	document understandingDomain Adaptation	—Unverified	0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer	May 2, 2025	document understandingHallucination	—Unverified	0
Finding Pragmatic Differences Between Disciplines	Sep 30, 2023	DiversityDocument Summarization	—Unverified	0
Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding)	Jul 1, 2016	document understanding	—Unverified	0
A User-Centered Concept Mining System for Query and Document Understanding at Tencent	May 21, 2019	document understandingKnowledge Base Construction	—Unverified	0
CREPE: Coordinate-Aware End-to-End Document Parser	May 1, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?	May 16, 2025	document understanding	—Unverified	0
ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information	Nov 29, 2022	document understandingRetrieval	—Unverified	0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration	Sep 3, 2023	Decoderdocument understanding	—Unverified	0
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified	0
Attention-Based Graph Neural Network with Global Context Awareness for Document Understanding	Oct 1, 2020	document understandinggraph construction	—Unverified	0
Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding	Dec 22, 2020	document understanding	—Unverified	0
Extract with Order for Coherent Multi-Document Summarization	Jun 12, 2017	Document Summarizationdocument understanding	—Unverified	0
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding	May 19, 2023	document understanding	—Unverified	0
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction	Mar 16, 2022	Document AIdocument understanding	—Unverified	0
A Multi-Modal Multilingual Benchmark for Document Image Classification	Oct 25, 2023	ClassificationCross-Lingual Transfer	—Unverified	0
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models	Feb 29, 2024	Contrastive Learningdocument understanding	—Unverified	0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency	Nov 9, 2023	document understandingKey Information Extraction	—Unverified	0
DOGE: Towards Versatile Visual Document Grounding and Referring	Nov 26, 2024	document understanding	—Unverified	0
DUBLIN -- Document Understanding By Language-Image Network	May 23, 2023	Document Classificationdocument understanding	—Unverified	0
Efficient End-to-End Visual Document Understanding with Rationale Distillation	Nov 16, 2023	document understandingImage to text	—Unverified	0
A Token-level Text Image Foundation Model for Document Understanding	Mar 4, 2025	document understandingVisual Question Answering (VQA)	—Unverified	0

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.