document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 309 papers

Title	Date	Tasks	Status	Hype
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends	Jul 14, 2025	document understandingOptical Character Recognition	—Unverified	0
PaddleOCR 3.0 Technical Report	Jul 8, 2025	document understandingKey Information Extraction	—Unverified	0
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning	Jul 1, 2025	document understandingMultimodal Reasoning	CodeCode Available	7
Class-Agnostic Region-of-Interest Matching in Document Images	Jun 26, 2025	Document Layout Analysisdocument understanding	CodeCode Available	0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images	Jun 26, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	0
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models	Jun 25, 2025	document understandingHallucination	—Unverified	0
PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding	Jun 22, 2025	document understanding	—Unverified	0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Jun 18, 2025	document understandingMultiple-choice	—Unverified	0
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement	Jun 16, 2025	document understandingQuestion Answering	CodeCode Available	1
A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions	Jun 5, 2025	Computational Efficiencydocument understanding	—Unverified	0
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning	Jun 5, 2025	document understandingEvent Detection	—Unverified	0
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing	Jun 1, 2025	Document AIdocument understanding	CodeCode Available	0
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World	Jun 1, 2025	document understandingEntity Linking	CodeCode Available	1
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning	May 26, 2025	document understandingMultimodal Reasoning	—Unverified	0
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning	May 26, 2025	document understandingMachine Translation	—Unverified	0
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning	May 24, 2025	document understandingVisual Reasoning	—Unverified	0
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark	May 22, 2025	document understandingMultimodal Reasoning	CodeCode Available	1
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting	May 19, 2025	document understandingOptical Character Recognition (OCR)	—Unverified	0
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?	May 16, 2025	document understanding	—Unverified	0
Document Image Rectification Bases on Self-Adaptive Multitask Fusion	May 9, 2025	document understanding	—Unverified	0
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	May 8, 2025	document understandingInstruction Following	CodeCode Available	1
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer	May 2, 2025	document understandingHallucination	—Unverified	0
FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding	Apr 24, 2025	document understandingMME	CodeCode Available	1
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models	Apr 16, 2025	document understandingLayout Design	CodeCode Available	0
Relation-Rich Visual Document Generator for Visual Information Extraction	Apr 14, 2025	Diversitydocument understanding	CodeCode Available	0

Show:10 25 50

← PrevPage 1 of 13Next →

No leaderboard results yet.