document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 309 papers

Title	Date	Tasks	Status	Hype
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding	Apr 12, 2025	BenchmarkingDocument AI	—Unverified	0
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding	Apr 3, 2025	document understandingLanguage Modeling	—Unverified	0
How does Watermarking Affect Visual Language Models in Document Understanding?	Apr 1, 2025	document understanding	—Unverified	0
Improving Applicability of Deep Learning based Token Classification models during Training	Mar 28, 2025	document understandingtoken-classification	—Unverified	0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?	Mar 27, 2025	Document Summarizationdocument understanding	CodeCode Available	0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction	Mar 25, 2025	document understandingobject-detection	CodeCode Available	0
SFDLA: Source-Free Document Layout Analysis	Mar 24, 2025	AvgDocument Layout Analysis	CodeCode Available	0
A Simple yet Effective Layout Token in Large Language Models for Document Understanding	Mar 24, 2025	document understandingPosition	—Unverified	0
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding	Mar 18, 2025	document understandingQuestion Answering	CodeCode Available	3
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding	Mar 18, 2025	document understandingQuestion Answering	CodeCode Available	0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks	Mar 6, 2025	document understandingLanguage Modeling	CodeCode Available	0
Zero-Shot Complex Question-Answering on Long Scientific Documents	Mar 4, 2025	Answer Generationdocument understanding	CodeCode Available	0
A Token-level Text Image Foundation Model for Document Understanding	Mar 4, 2025	document understandingVisual Question Answering (VQA)	—Unverified	0
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Feb 24, 2025	document understandingMultimodal Reasoning	—Unverified	0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available	0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding	Feb 20, 2025	document understandingOptical Character Recognition	—Unverified	0
Qwen2.5-VL Technical Report	Feb 19, 2025	document understanding	CodeCode Available	11
Assessing Generative AI value in a public sector context: evidence from a field experiment	Feb 13, 2025	document understanding	—Unverified	0
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models	Feb 6, 2025	document understandingInference Attack	CodeCode Available	0
AIN: The Arabic INclusive Large Multimodal Model	Jan 31, 2025	document understandingmodel	CodeCode Available	2
Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Jan 26, 2025	document understandingLanguage Modeling	CodeCode Available	1
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja	Jan 21, 2025	document understandingMachine Translation	CodeCode Available	0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations	Jan 6, 2025	Document AIdocument understanding	—Unverified	0
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends	Jan 4, 2025	document understandingQuestion Answering	—Unverified	0
Docopilot: Improving Multimodal Models for Document-Level Understanding	Jan 1, 2025	document understandingRAG	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 13Next →

No leaderboard results yet.