SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 51100 of 309 papers

TitleStatusHype
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documentsCode1
FRAG: Frame Selection Augmented Generation for Long Video and Long Document UnderstandingCode1
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMsCode1
Value Retrieval with Arbitrary Queries for Form-like DocumentsCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
On the Affinity, Rationality, and Diversity of Hierarchical Topic ModelingCode1
DocFormerv2: Local Features for Document UnderstandingCode1
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative RefinementCode1
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
PaLI-X: On Scaling up a Multilingual Vision and Language ModelCode1
End-to-end Document Recognition and Understanding with DessurtCode1
On Web-based Visual Corpus Construction for Visual Document UnderstandingCode1
A Discrete Variational Recurrent Topic Model without the Reparametrization TrickCode1
DocFormer: End-to-End Transformer for Document UnderstandingCode1
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document UnderstandingCode1
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Multimodal Tree Decoder for Table of Contents Extraction in Document ImagesCode0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
Deeper Clinical Document Understanding Using Relation ExtractionCode0
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
Message Passing Attention Networks for Document UnderstandingCode0
Data-driven Coreference-based Ontology BuildingCode0
Matching Article Pairs with Graphical Decomposition and ConvolutionsCode0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document UnderstandingCode0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Class-Agnostic Region-of-Interest Matching in Document ImagesCode0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
ChuLo: Chunk-Level Key Information Representation for Long Document ProcessingCode0
Chargrid: Towards Understanding 2D DocumentsCode0
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document UnderstandingCode0
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document UnderstandingCode0
Learned Compression for Compressed LearningCode0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsCode0
Is ChatGPT A Good Keyphrase Generator? A Preliminary StudyCode0
Information Redundancy and Biases in Public Document Information Extraction BenchmarksCode0
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document UnderstandingCode0
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document ParsingCode0
Improving Clinical Document Understanding on COVID-19 Research with Spark NLPCode0
Machine Unlearning for Document ClassificationCode0
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural NetworkCode0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.