SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 2650 of 309 papers

TitleStatusHype
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingCode1
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
CiteWorth: Cite-Worthiness Detection for Improved Scientific Document UnderstandingCode1
End-to-end Document Recognition and Understanding with DessurtCode1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingCode1
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real WorldCode1
DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningCode1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
DocFormerv2: Local Features for Document UnderstandingCode1
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl DataCode1
Docopilot: Improving Multimodal Models for Document-Level UnderstandingCode1
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
DocumentCLIP: Linking Figures and Main Body Text in Reflowed DocumentsCode1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
Document Understanding Dataset and Evaluation (DUDE)Code1
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
Show:102550
← PrevPage 2 of 13Next →

No leaderboard results yet.