SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 151200 of 309 papers

TitleStatusHype
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification0
Transformer-based Approach for Document Understanding0
Two to Five Truths in Non-Negative Matrix Factorization0
Understanding Long Documents with Different Position-Aware Attentions0
UniDoc: Unified Pretraining Framework for Document Understanding0
Unified Pretraining Framework for Document Understanding0
Unimodal and Multimodal Representation Training for Relation Extraction0
ViRED: Prediction of Visual Relations in Engineering Drawings0
WebFormer: The Web-page Transformer for Structure Information Extraction0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs0
What Makes a Good Dataset for Symbol Description Reading?0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
Workshop on Document Intelligence Understanding0
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding0
Deep Learning based Visually Rich Document Content Understanding: A Survey0
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models0
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?0
VRDU: A Benchmark for Visually-rich Document Understanding0
Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents0
A Multi-Modal Multilingual Benchmark for Document Image Classification0
Arctic-TILT. Business Document Understanding at Sub-Billion Scale0
A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications0
A Simple yet Effective Layout Token in Large Language Models for Document Understanding0
Assessing Generative AI value in a public sector context: evidence from a field experiment0
A Survey and Approach to Chart Classification0
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends0
A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions0
AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-210
A Token-level Text Image Foundation Model for Document Understanding0
Attention-Based Graph Neural Network with Global Context Awareness for Document Understanding0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration0
A User-Centered Concept Mining System for Query and Document Understanding at Tencent0
Auto-encodeurs pour la compr\'ehension de documents parl\'es (Auto-encoders for Spoken Document Understanding)0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Automatic Knowledge Extraction with Human Interface0
AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content0
BERT-AL: BERT for Arbitrarily Long Document Understanding0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks0
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations0
BROS: A Pre-trained Language Model for Understanding Texts in Document0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
Building and better understanding vision-language models: insights and future directions0
Calculating Semantic Similarity between Academic Articles using Topic Event and Ontology0
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence0
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning0
ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information0
CREPE: Coordinate-Aware End-to-End Document Parser0
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.