document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 309 papers

Title	Date	Tasks	Status	Hype
Qwen2.5-VL Technical Report	Feb 19, 2025	document understanding	CodeCode Available	11
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception	Oct 16, 2024	Document Layout Analysisdocument understanding	CodeCode Available	9
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	Jul 1, 2025	document understandingMultimodal Reasoning	CodeCode Available	7
ColPali: Efficient Document Retrieval with Vision Language Models	Jun 27, 2024	document understandingRAG	CodeCode Available	7
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid	Aug 4, 2024	document understanding	CodeCode Available	5
Focus Anywhere for Fine-grained Multi-page Document Understanding	May 23, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	5
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	Mar 7, 2024	document understandingKey Information Extraction	CodeCode Available	5
LLMMapReduce: Simplified Long-Sequence Processing using Large Language Models	Oct 12, 2024	document understanding	CodeCode Available	4
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding	Mar 18, 2025	document understandingQuestion Answering	CodeCode Available	3
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution	Sep 19, 2024	document understandingVideo Question Answering	CodeCode Available	3
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning	Jan 12, 2024	Diversitydocument understanding	CodeCode Available	3
Unifying Vision, Text, and Layout for Universal Document Processing	Dec 5, 2022	Document AIdocument understanding	CodeCode Available	3
OCR-free Document Understanding Transformer	Nov 30, 2021	Document Image Classificationdocument understanding	CodeCode Available	3
AIN: The Arabic INclusive Large Multimodal Model	Jan 31, 2025	document understandingmodel	CodeCode Available	2
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction	Nov 19, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
One missing piece in Vision and Language: A Survey on Comics Understanding	Sep 14, 2024	document understandingimage-classification	CodeCode Available	2
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding	Jul 2, 2024	document understandingKey Information Extraction	CodeCode Available	2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations	Jul 1, 2024	Benchmarkingdocument understanding	CodeCode Available	2
Visually Guided Generative Text-Layout Pre-training for Document Intelligence	Mar 25, 2024	Document Classificationdocument understanding	CodeCode Available	2
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions	Jan 24, 2024	document understandingQuestion Answering	CodeCode Available	2
Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness	Jun 1, 2022	CPUdocument understanding	CodeCode Available	2
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding	Feb 28, 2022	Document Image Classificationdocument understanding	CodeCode Available	2
ICDAR 2021 Competition on Scientific Literature Parsing	Jun 8, 2021	document understandingobject-detection	CodeCode Available	2
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement	Jun 16, 2025	document understandingQuestion Answering	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 13Next →

No leaderboard results yet.