SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 201250 of 309 papers

TitleStatusHype
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode1
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document UnderstandingCode0
Multimodal Tree Decoder for Table of Contents Extraction in Document ImagesCode0
Unifying Vision, Text, and Layout for Universal Document ProcessingCode3
ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information0
VRDU: A Benchmark for Visually-rich Document Understanding0
QueryForm: A Simple Zero-shot Form Entity Query Framework0
Unimodal and Multimodal Representation Training for Relation Extraction0
On Web-based Visual Corpus Construction for Visual Document UnderstandingCode1
Transformer-based Approach for Document Understanding0
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document UnderstandingCode0
XDoc: Unified Pre-training for Cross-Format Document UnderstandingCode0
DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding0
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text0
Improving Keyphrase Extraction with Data Augmentation and Information Filtering0
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
DeeperDive: The Unreasonable Effectiveness of Weak Supervision in Document Understanding A Case Study in Collaboration with UiPath Inc0
Understanding Long Documents with Different Position-Aware Attentions0
Knowing Where and What: Unified Word Block Pretraining for Document UnderstandingCode0
Towards Complex Document Understanding By Discrete Reasoning0
DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding0
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding0
Test-Time Adaptation for Visual Document Understanding0
RDU: A Region-based Approach to Form-style Document Understanding0
Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models)0
Delivering Document Conversion as a Cloud Service with High Throughput and ResponsivenessCode2
MATrIX -- Modality-Aware Transformer for Information eXtraction0
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document UnderstandingCode0
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding0
DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering0
Unified Pretraining Framework for Document Understanding0
End-to-end Document Recognition and Understanding with DessurtCode1
Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingCode1
Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods0
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction0
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document UnderstandingCode1
Hierarchical BERT for Medical Document Understanding0
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document UnderstandingCode2
WebFormer: The Web-page Transformer for Structure Information Extraction0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model0
Efficient layout-aware pretraining for multimodal form understanding0
Deeper Clinical Document Understanding Using Relation ExtractionCode0
Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
UniDoc: Unified Pretraining Framework for Document Understanding0
OCR-free Document Understanding TransformerCode3
SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation0
PSG: Prompt-based Sequence Generation for Acronym Extraction0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.