document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 309 papers

Title	Date	Tasks	Status	Hype
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	May 22, 2025	document understandingMultimodal Reasoning	CodeCode Available	1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	May 8, 2025	document understandingInstruction Following	CodeCode Available	1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1
Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Jan 26, 2025	document understandingLanguage Modeling	CodeCode Available	1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Jan 1, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
Docopilot: Improving Multimodal Models for Document-Level Understanding	Jan 1, 2025	document understandingRAG	CodeCode Available	1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Dec 24, 2024	document understandingQuestion Answering	CodeCode Available	1
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models	Dec 18, 2024	document understandingImage Captioning	CodeCode Available	1
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding	Sep 29, 2024	document understandingEntity Linking	CodeCode Available	1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Aug 27, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding	Jul 17, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents	Jul 12, 2024	Document Layout Analysisdocument understanding	CodeCode Available	1
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning	Jun 4, 2024	document understandingGPU	CodeCode Available	1
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding	Feb 28, 2024	document understandingInformation Retrieval	CodeCode Available	1
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling	Jan 25, 2024	DecoderDiversity	CodeCode Available	1
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data	Dec 15, 2023	document understandingQuestion Answering	CodeCode Available	1
Privacy-Aware Document Visual Question Answering	Dec 15, 2023	document understandingFederated Learning	CodeCode Available	1
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs	Nov 22, 2023	document understandingInstruction Following	CodeCode Available	1
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading	Oct 23, 2023	Document AIdocument understanding	CodeCode Available	1
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling	Aug 15, 2023	document understanding	CodeCode Available	1
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents	Jun 9, 2023	Contrastive Learningdocument understanding	CodeCode Available	1
DocFormerv2: Local Features for Document Understanding	Jun 2, 2023	Decoderdocument understanding	CodeCode Available	1
PaLI-X: On Scaling up a Multilingual Vision and Language Model	May 29, 2023	Chart Question Answeringdocument understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 13Next →

No leaderboard results yet.