document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 309 papers

Title	Date	Tasks	Status	Hype	Score
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding	Feb 28, 2024	document understandingInformation Retrieval	CodeCode Available	1	5
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Aug 27, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1	5
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents	Jul 12, 2024	Document Layout Analysisdocument understanding	CodeCode Available	1	5
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1	5
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs	Nov 22, 2023	document understandingInstruction Following	CodeCode Available	1	5
Value Retrieval with Arbitrary Queries for Form-like Documents	Dec 15, 2021	document understandingForm	CodeCode Available	1	5
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding	Sep 29, 2024	document understandingEntity Linking	CodeCode Available	1	5
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling	Jan 25, 2024	DecoderDiversity	CodeCode Available	1	5
DocFormerv2: Local Features for Document Understanding	Jun 2, 2023	Decoderdocument understanding	CodeCode Available	1	5
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding	Oct 12, 2022	document-image-classificationDocument Image Classification	CodeCode Available	1	5
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement	Jun 16, 2025	document understandingQuestion Answering	CodeCode Available	1	5
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling	Aug 15, 2023	document understanding	CodeCode Available	1	5
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks	Aug 23, 2022	Document Layout Analysisdocument understanding	CodeCode Available	1	5
PaLI-X: On Scaling up a Multilingual Vision and Language Model	May 29, 2023	Chart Question Answeringdocument understanding	CodeCode Available	1	5
End-to-end Document Recognition and Understanding with Dessurt	Mar 30, 2022	document understandingVisual Question Answering (VQA)	CodeCode Available	1	5
On Web-based Visual Corpus Construction for Visual Document Understanding	Nov 7, 2022	document understandingOptical Character Recognition (OCR)	CodeCode Available	1	5
A Discrete Variational Recurrent Topic Model without the Reparametrization Trick	Oct 22, 2020	document understandingVariational Inference	CodeCode Available	1	5
DocFormer: End-to-End Transformer for Document Understanding	Jun 22, 2021	Document Image Classificationdocument understanding	CodeCode Available	1	5
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding	Jul 17, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1	5
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism	Apr 29, 2024	document understandingGPU	CodeCode Available	0	5
Multimodal Tree Decoder for Table of Contents Extraction in Document Images	Dec 6, 2022	Decoderdocument understanding	CodeCode Available	0	5
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report	Jun 17, 2024	document understanding	CodeCode Available	0	5
Multimodal weighted graph representation for information extraction from visually rich documents.	Jan 5, 2024	Document Layout Analysisdocument understanding	CodeCode Available	0	5
Deeper Clinical Document Understanding Using Relation Extraction	Dec 25, 2021	document understandingnamed-entity-recognition	CodeCode Available	0	5
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting	May 21, 2024	document-image-classificationDocument Image Classification	CodeCode Available	0	5
Message Passing Attention Networks for Document Understanding	Aug 17, 2019	document understandingMulti-Modal Document Classification	CodeCode Available	0	5
Data-driven Coreference-based Ontology Building	Oct 22, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	0	5
Matching Article Pairs with Graphical Decomposition and Convolutions	Feb 21, 2018	Articlesdocument understanding	CodeCode Available	0	5
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?	Mar 27, 2025	Document Summarizationdocument understanding	CodeCode Available	0	5
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding	Oct 16, 2021	document understanding	CodeCode Available	0	5
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding	May 1, 2022	document understanding	CodeCode Available	0	5
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding	Feb 28, 2024	document understandingForm	CodeCode Available	0	5
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis	May 15, 2023	ArticlesDocument Layout Analysis	CodeCode Available	0	5
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images	Jun 26, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	0	5
Class-Agnostic Region-of-Interest Matching in Document Images	Jun 26, 2025	Document Layout Analysisdocument understanding	CodeCode Available	0	5
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding	Mar 18, 2025	document understandingQuestion Answering	CodeCode Available	0	5
ChuLo: Chunk-Level Key Information Representation for Long Document Processing	Oct 14, 2024	ChunkingClassification	CodeCode Available	0	5
Chargrid: Towards Understanding 2D Documents	Sep 24, 2018	Decoderdocument understanding	CodeCode Available	0	5
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding	Apr 18, 2021	Document Image Classificationdocument understanding	CodeCode Available	0	5
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	Dec 29, 2020	Document Image ClassificationDocument Layout Analysis	CodeCode Available	0	5
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available	0	5
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding	Apr 8, 2024	Document AIdocument understanding	CodeCode Available	0	5
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models	Jun 5, 2023	document understandingQuestion Answering	CodeCode Available	0	5
Is ChatGPT A Good Keyphrase Generator? A Preliminary Study	Mar 23, 2023	Diversitydocument understanding	CodeCode Available	0	5
Information Redundancy and Biases in Public Document Information Extraction Benchmarks	Apr 28, 2023	document understandingKey Information Extraction	CodeCode Available	0	5
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding	Oct 8, 2022	document understandingKnowledge Graphs	CodeCode Available	0	5
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing	Jun 1, 2025	Document AIdocument understanding	CodeCode Available	0	5
Improving Clinical Document Understanding on COVID-19 Research with Spark NLP	Dec 7, 2020	AnatomyClinical Assertion Status Detection	CodeCode Available	0	5
Machine Unlearning for Document Classification	Apr 29, 2024	ClassificationDocument Classification	CodeCode Available	0	5
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network	Sep 11, 2024	Document Layout Analysisdocument understanding	CodeCode Available	0	5

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.