SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 51100 of 309 papers

TitleStatusHype
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language ModelsCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documentsCode1
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative RefinementCode1
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Privacy-Aware Document Visual Question AnsweringCode1
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMsCode1
DocFormerv2: Local Features for Document UnderstandingCode1
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
PaLI-X: On Scaling up a Multilingual Vision and Language ModelCode1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
Ocean-OCR: Towards General OCR Application via a Vision-Language ModelCode1
End-to-end Document Recognition and Understanding with DessurtCode1
A Discrete Variational Recurrent Topic Model without the Reparametrization TrickCode1
DocFormer: End-to-End Transformer for Document UnderstandingCode1
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel SolutionCode1
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document UnderstandingCode0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingCode0
Deeper Clinical Document Understanding Using Relation ExtractionCode0
Message Passing Attention Networks for Document UnderstandingCode0
DavarOCR: A Toolbox for OCR and Multi-Modal Document UnderstandingCode0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Data-driven Coreference-based Ontology BuildingCode0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document UnderstandingCode0
Matching Article Pairs with Graphical Decomposition and ConvolutionsCode0
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode0
Machine Unlearning for Document ClassificationCode0
Long-Range Transformer Architectures for Document UnderstandingCode0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Class-Agnostic Region-of-Interest Matching in Document ImagesCode0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document UnderstandingCode0
Multimodal Tree Decoder for Table of Contents Extraction in Document ImagesCode0
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsCode0
ChuLo: Chunk-Level Key Information Representation for Long Document ProcessingCode0
Chargrid: Towards Understanding 2D DocumentsCode0
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document UnderstandingCode0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document UnderstandingCode0
Knowing Where and What: Unified Word Block Pretraining for Document UnderstandingCode0
Learned Compression for Compressed LearningCode0
Information Redundancy and Biases in Public Document Information Extraction BenchmarksCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.