SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 5175 of 309 papers

TitleStatusHype
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language ModelsCode1
Document Understanding Dataset and Evaluation (DUDE)Code1
LineFormer: Rethinking Line Chart Data Extraction as Instance SegmentationCode1
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl DataCode1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
On Web-based Visual Corpus Construction for Visual Document UnderstandingCode1
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
End-to-end Document Recognition and Understanding with DessurtCode1
Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingCode1
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document UnderstandingCode1
Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
DocFormer: End-to-End Transformer for Document UnderstandingCode1
CiteWorth: Cite-Worthiness Detection for Improved Scientific Document UnderstandingCode1
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout TransformerCode1
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel SolutionCode1
A Discrete Variational Recurrent Topic Model without the Reparametrization TrickCode1
MedICaT: A Dataset of Medical Images, Captions, and Textual ReferencesCode1
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends0
PaddleOCR 3.0 Technical ReportCode0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Class-Agnostic Region-of-Interest Matching in Document ImagesCode0
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models0
PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document UnderstandingCode0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.