SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 51100 of 309 papers

TitleStatusHype
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
On Web-based Visual Corpus Construction for Visual Document UnderstandingCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingCode1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
LineFormer: Rethinking Line Chart Data Extraction as Instance SegmentationCode1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingCode1
Ocean-OCR: Towards General OCR Application via a Vision-Language ModelCode1
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real WorldCode1
A Discrete Variational Recurrent Topic Model without the Reparametrization TrickCode1
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningCode1
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout TransformerCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
DocFormer: End-to-End Transformer for Document UnderstandingCode1
DocFormerv2: Local Features for Document UnderstandingCode1
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning0
BERT-AL: BERT for Arbitrarily Long Document Understanding0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review0
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc0
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content0
A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications0
Automatic Knowledge Extraction with Human Interface0
Decontextualization: Making Sentences Stand-Alone0
Arctic-TILT. Business Document Understanding at Sub-Billion Scale0
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Finding Pragmatic Differences Between Disciplines0
Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding)0
A User-Centered Concept Mining System for Query and Document Understanding at Tencent0
CREPE: Coordinate-Aware End-to-End Document Parser0
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?0
ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration0
DrVideo: Document Retrieval Based Long Video Understanding0
Attention-Based Graph Neural Network with Global Context Awareness for Document Understanding0
Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding0
Extract with Order for Coherent Multi-Document Summarization0
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding0
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction0
A Multi-Modal Multilingual Benchmark for Document Image Classification0
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency0
DOGE: Towards Versatile Visual Document Grounding and Referring0
DUBLIN -- Document Understanding By Language-Image Network0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
A Token-level Text Image Foundation Model for Document Understanding0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.