SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 126150 of 309 papers

TitleStatusHype
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition0
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence0
Visually Guided Generative Text-Layout Pre-training for Document IntelligenceCode2
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning0
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning0
LAPDoc: Layout-Aware Prompting for Documents0
Financial Report Chunking for Effective Retrieval Augmented GenerationCode0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents0
On the Affinity, Rationality, and Diversity of Hierarchical Topic ModelingCode1
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with InstructionsCode2
INTERS: Unlocking the Power of Large Language Models in Search with Instruction TuningCode3
Long Context Compression with Activation Beacon0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
DocGraphLM: Documental Graph Language Model for Information Extraction0
On Scaling Up a Multilingual Vision and Language Model0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition0
DocLLM: A layout-aware generative language model for multimodal document understanding0
Show:102550
← PrevPage 6 of 13Next →

No leaderboard results yet.