document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 309 papers

Title	Date	Tasks	Status
Deeper Clinical Document Understanding Using Relation Extraction	Dec 25, 2021	document understandingnamed-entity-recognition	CodeCode Available
DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering	May 1, 2022	document understandingOpen-Domain Question Answering	CodeCode Available
Relation-Rich Visual Document Generator for Visual Information Extraction	Apr 14, 2025	Diversitydocument understanding	CodeCode Available
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding	Feb 28, 2024	document understandingForm	CodeCode Available
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis	May 15, 2023	ArticlesDocument Layout Analysis	CodeCode Available
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network	Sep 11, 2024	Document Layout Analysisdocument understanding	CodeCode Available
Machine Unlearning for Document Classification	Apr 29, 2024	ClassificationDocument Classification	CodeCode Available
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding	Oct 16, 2021	document understanding	CodeCode Available
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding	May 1, 2022	document understanding	CodeCode Available
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding	Mar 18, 2025	document understandingQuestion Answering	CodeCode Available
XDoc: Unified Pre-training for Cross-Format Document Understanding	Oct 6, 2022	document understandingSemantic entity labeling	CodeCode Available
Zero-Shot Complex Question-Answering on Long Scientific Documents	Mar 4, 2025	Answer Generationdocument understanding	CodeCode Available
Matching Article Pairs with Graphical Decomposition and Convolutions	Feb 21, 2018	Articlesdocument understanding	CodeCode Available
ChuLo: Chunk-Level Key Information Representation for Long Document Processing	Oct 14, 2024	ChunkingClassification	CodeCode Available
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing	Jun 1, 2025	Document AIdocument understanding	CodeCode Available
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?	Mar 27, 2025	Document Summarizationdocument understanding	CodeCode Available
Improving Clinical Document Understanding on COVID-19 Research with Spark NLP	Dec 7, 2020	AnatomyClinical Assertion Status Detection	CodeCode Available
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding	Jul 19, 2024	document understandingInformativeness	CodeCode Available
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction	Mar 25, 2025	document understandingobject-detection	CodeCode Available
Message Passing Attention Networks for Document Understanding	Aug 17, 2019	document understandingMulti-Modal Document Classification	CodeCode Available
Chargrid: Towards Understanding 2D Documents	Sep 24, 2018	Decoderdocument understanding	CodeCode Available
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap	Sep 21, 2023	Contrastive Learningdocument understanding	CodeCode Available
Hypergraph based Understanding for Document Semantic Entity Recognition	Jul 9, 2024	document understanding	CodeCode Available
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja	Jan 21, 2025	document understandingMachine Translation	CodeCode Available
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding	Mar 19, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	Sep 5, 2024	document understandingGPU	CodeCode Available
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	CodeCode Available
A Survey of Deep Learning Approaches for OCR and Document Understanding	Nov 27, 2020	document understandingOptical Character Recognition (OCR)	CodeCode Available
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting	May 21, 2024	document-image-classificationDocument Image Classification	CodeCode Available
DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding	Jul 14, 2022	document understandingOptical Character Recognition (OCR)	CodeCode Available
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding	May 6, 2024	Contrastive Learningdocument understanding	CodeCode Available
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report	Jun 17, 2024	document understanding	CodeCode Available
Multimodal Tree Decoder for Table of Contents Extraction in Document Images	Dec 6, 2022	Decoderdocument understanding	CodeCode Available
Multimodal weighted graph representation for information extraction from visually rich documents.	Jan 5, 2024	Document Layout Analysisdocument understanding	CodeCode Available
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism	Apr 29, 2024	document understandingGPU	CodeCode Available
SFDLA: Source-Free Document Layout Analysis	Mar 24, 2025	AvgDocument Layout Analysis	CodeCode Available
Blockwise Self-Attention for Long Document Understanding	Nov 7, 2019	document understandingLanguage Modeling	CodeCode Available
Data-driven Coreference-based Ontology Building	Oct 22, 2024	coreference-resolutionCoreference Resolution	CodeCode Available
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond	Oct 19, 2023	Document AIDocument Layout Analysis	CodeCode Available
Financial Report Chunking for Effective Retrieval Augmented Generation	Feb 5, 2024	Chunkingdocument understanding	CodeCode Available
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition	Mar 28, 2024	Decoderdocument understanding	CodeCode Available
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	CodeCode Available
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available
KRED: Knowledge-Aware Document Representation for News Recommendations	Oct 25, 2019	Articlesdocument understanding	CodeCode Available
Skim-Attention: Learning to Focus via Document Layout	Sep 2, 2021	document understandingLanguage Modeling	CodeCode Available
Vision Grid Transformer for Document Layout Analysis	Aug 29, 2023	Document AIDocument Layout Analysis	CodeCode Available
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available
Long Context Compression with Activation Beacon	Jan 7, 2024	4kdocument understanding	CodeCode Available
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models	Apr 16, 2025	document understandingLayout Design	CodeCode Available
PaddleOCR 3.0 Technical Report	Jul 8, 2025	document understandingKey Information Extraction	CodeCode Available

Show:10 25 50

← PrevPage 6 of 7Next →

No leaderboard results yet.