SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 151200 of 309 papers

TitleStatusHype
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl DataCode1
Privacy-Aware Document Visual Question AnsweringCode1
SLJP: Semantic Extraction based Legal Judgment Prediction0
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMsCode1
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency0
A Multi-Modal Multilingual Benchmark for Document Image Classification0
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks0
Finding Pragmatic Differences Between Disciplines0
Document Understanding for Healthcare Referrals0
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapCode0
KOSMOS-2.5: A Multimodal Literate Model0
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification0
Long-Range Transformer Architectures for Document UnderstandingCode0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration0
Vision Grid Transformer for Document Layout AnalysisCode0
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
Workshop on Document Intelligence Understanding0
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary0
A Survey and Approach to Chart Classification0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
DocumentNet: Bridging the Data Gap in Document Pre-Training0
DocumentCLIP: Linking Figures and Main Body Text in Reflowed DocumentsCode1
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsCode0
DocFormerv2: Local Features for Document UnderstandingCode1
Table Detection for Visually Rich Document ImagesCode0
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding0
PaLI-X: On Scaling up a Multilingual Vision and Language ModelCode1
Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization ModelCode0
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content0
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language ModelsCode1
DUBLIN -- Document Understanding By Language-Image Network0
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding0
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding0
DLUE: Benchmarking Document Language Understanding0
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode0
Document Understanding Dataset and Evaluation (DUDE)Code1
Two to Five Truths in Non-Negative Matrix Factorization0
Revisiting Table Detection Datasets for Visually Rich Documents0
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction0
LineFormer: Rethinking Line Chart Data Extraction as Instance SegmentationCode1
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl DataCode1
Information Redundancy and Biases in Public Document Information Extraction BenchmarksCode0
What Makes a Good Dataset for Symbol Description Reading?0
PDFVQA: A New Dataset for Real-World VQA on PDF Documents0
Is ChatGPT A Good Keyphrase Generator? A Preliminary StudyCode0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.