document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 309 papers

Title	Date	Tasks	Status
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding	May 1, 2022	document understandingForm	—Unverified
Deep Learning based Visually Rich Document Content Understanding: A Survey	Aug 2, 2024	Deep Learningdocument understanding	—Unverified
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents	Jan 26, 2024	4kDocument AI	—Unverified
LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model	Jan 16, 2022	document understanding	—Unverified
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding	Nov 2, 2024	document understandingQuestion Answering	—Unverified
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Nov 7, 2024	document understandingOptical Character Recognition	—Unverified
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary	Jul 24, 2023	document understandingOptical Character Recognition (OCR)	—Unverified
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications	Nov 28, 2024	document understandingMathematical Reasoning	—Unverified
MATrIX -- Modality-Aware Transformer for Information eXtraction	May 17, 2022	document understanding	—Unverified
Memory-Augmented Agent Training for Business Document Understanding	Dec 17, 2024	document understanding	—Unverified
Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents	Dec 1, 2020	document understandingLanguage Modeling	—Unverified
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework	Nov 9, 2024	document understandingQuestion Answering	—Unverified
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding	Oct 25, 2024	Benchmarkingdocument understanding	—Unverified
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding	Mar 19, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	Sep 5, 2024	document understandingGPU	—Unverified
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	—Unverified
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning	May 26, 2025	document understandingMachine Translation	—Unverified
Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web	Jul 1, 2020	document understandingEntity Linking	—Unverified
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition	Jul 16, 2024	Decoderdocument understanding	—Unverified
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding	Apr 12, 2025	BenchmarkingDocument AI	—Unverified
Notes on Applicability of GPT-4 to Document Understanding	May 28, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
Object-oriented Neural Programming (OONP) for Document Understanding	Sep 26, 2017	document understandingObject	—Unverified
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition	Mar 28, 2024	Decoderdocument understanding	—Unverified
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	—Unverified
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	—Unverified

Show:10 25 50

← PrevPage 10 of 13Next →

No leaderboard results yet.