SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 2650 of 309 papers

TitleStatusHype
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding0
How does Watermarking Affect Visual Language Models in Document Understanding?0
Improving Applicability of Deep Learning based Token Classification models during Training0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionCode0
SFDLA: Source-Free Document Layout AnalysisCode0
A Simple yet Effective Layout Token in Large Language Models for Document Understanding0
MDocAgent: A Multi-Modal Multi-Agent Framework for Document UnderstandingCode3
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
Zero-Shot Complex Question-Answering on Long Scientific DocumentsCode0
A Token-level Text Image Foundation Model for Document Understanding0
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Qwen2.5-VL Technical ReportCode11
Assessing Generative AI value in a public sector context: evidence from a field experiment0
DocMIA: Document-Level Membership Inference Attacks against DocVQA ModelsCode0
AIN: The Arabic INclusive Large Multimodal ModelCode2
Ocean-OCR: Towards General OCR Application via a Vision-Language ModelCode1
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in HanjaCode0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations0
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends0
Docopilot: Improving Multimodal Models for Document-Level UnderstandingCode1
Show:102550
← PrevPage 2 of 13Next →

No leaderboard results yet.