document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 309 papers

Title	Date	Tasks	Status	Hype	Score
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1	5
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	May 8, 2025	document understandingInstruction Following	CodeCode Available	1	5
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding	Sep 29, 2024	document understandingEntity Linking	CodeCode Available	1	5
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling	Aug 15, 2023	document understanding	CodeCode Available	1	5
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	May 22, 2025	document understandingMultimodal Reasoning	CodeCode Available	1	5
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Dec 24, 2024	document understandingQuestion Answering	CodeCode Available	1	5
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding	Oct 12, 2022	document-image-classificationDocument Image Classification	CodeCode Available	1	5
CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding	May 23, 2021	document understandingDomain Adaptation	CodeCode Available	1	5
End-to-end Document Recognition and Understanding with Dessurt	Mar 30, 2022	document understandingVisual Question Answering (VQA)	CodeCode Available	1	5
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis	Jan 1, 2023	ArticlesDocument Layout Analysis	CodeCode Available	1	5
Multimodal Pre-training Based on Graph Attention Network for Document Understanding	Mar 25, 2022	document understandingGraph Attention	CodeCode Available	1	5
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1	5
DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents	Oct 1, 2022	document understandingForm	CodeCode Available	1	5
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning	Jun 4, 2024	document understandingGPU	CodeCode Available	1	5
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Jan 1, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	1	5
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1	5
DocFormerv2: Local Features for Document Understanding	Jun 2, 2023	Decoderdocument understanding	CodeCode Available	1	5
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data	Apr 28, 2023	document understandingLanguage Modeling	CodeCode Available	1	5
Docopilot: Improving Multimodal Models for Document-Level Understanding	Jan 1, 2025	document understandingRAG	CodeCode Available	1	5
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading	Oct 23, 2023	Document AIdocument understanding	CodeCode Available	1	5
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents	Jun 9, 2023	Contrastive Learningdocument understanding	CodeCode Available	1	5
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Aug 27, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1	5
Document Understanding Dataset and Evaluation (DUDE)	May 15, 2023	Document AIdocument understanding	CodeCode Available	1	5
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding	Feb 28, 2024	document understandingInformation Retrieval	CodeCode Available	1	5
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks	Aug 23, 2022	Document Layout Analysisdocument understanding	CodeCode Available	1	5

Show:10 25 50

← PrevPage 2 of 13Next →

No leaderboard results yet.