SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 2650 of 309 papers

TitleStatusHype
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real WorldCode1
ARB: A Comprehensive Arabic Multimodal Reasoning BenchmarkCode1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Ocean-OCR: Towards General OCR Application via a Vision-Language ModelCode1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
Docopilot: Improving Multimodal Models for Document-Level UnderstandingCode1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingCode1
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language ModelsCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document UnderstandingCode1
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documentsCode1
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningCode1
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
On the Affinity, Rationality, and Diversity of Hierarchical Topic ModelingCode1
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl DataCode1
Privacy-Aware Document Visual Question AnsweringCode1
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMsCode1
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
DocumentCLIP: Linking Figures and Main Body Text in Reflowed DocumentsCode1
DocFormerv2: Local Features for Document UnderstandingCode1
PaLI-X: On Scaling up a Multilingual Vision and Language ModelCode1
Show:102550
← PrevPage 2 of 13Next →

No leaderboard results yet.