SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 2650 of 309 papers

TitleStatusHype
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingCode1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
End-to-end Document Recognition and Understanding with DessurtCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
CiteWorth: Cite-Worthiness Detection for Improved Scientific Document UnderstandingCode1
LineFormer: Rethinking Line Chart Data Extraction as Instance SegmentationCode1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingCode1
Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingCode1
DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl DataCode1
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
DocFormer: End-to-End Transformer for Document UnderstandingCode1
Docopilot: Improving Multimodal Models for Document-Level UnderstandingCode1
DocFormerv2: Local Features for Document UnderstandingCode1
DocumentCLIP: Linking Figures and Main Body Text in Reflowed DocumentsCode1
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real WorldCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout TransformerCode1
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
Show:102550
← PrevPage 2 of 13Next →

No leaderboard results yet.