SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 101150 of 309 papers

TitleStatusHype
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table RecognitionCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
Multimodal Tree Decoder for Table of Contents Extraction in Document ImagesCode0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document UnderstandingCode0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet AllocationCode0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionCode0
PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document UnderstandingCode0
DavarOCR: A Toolbox for OCR and Multi-Modal Document UnderstandingCode0
DocMIA: Document-Level Membership Inference Attacks against DocVQA ModelsCode0
Message Passing Attention Networks for Document UnderstandingCode0
A Survey of Deep Learning Approaches for OCR and Document UnderstandingCode0
Blockwise Self-Attention for Long Document UnderstandingCode0
Is ChatGPT A Good Keyphrase Generator? A Preliminary StudyCode0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document UnderstandingCode0
Matching Article Pairs with Graphical Decomposition and ConvolutionsCode0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document UnderstandingCode0
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode0
Improving Clinical Document Understanding on COVID-19 Research with Spark NLPCode0
Long-Range Transformer Architectures for Document UnderstandingCode0
Hypergraph based Understanding for Document Semantic Entity RecognitionCode0
Machine Unlearning for Document ClassificationCode0
Bidirectional Context-Aware Hierarchical Attention Network for Document UnderstandingCode0
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document UnderstandingCode0
Learned Compression for Compressed LearningCode0
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in HanjaCode0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionCode0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document UnderstandingCode0
Knowing Where and What: Unified Word Block Pretraining for Document UnderstandingCode0
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document UnderstandingCode0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingCode0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document ParsingCode0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural NetworkCode0
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning0
Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models)0
BERT-AL: BERT for Arbitrarily Long Document Understanding0
From Entity Linking to Question Answering -- Recent Progress on Semantic Grounding Tasks0
Friendly Topic Assistant for Transformer Based Abstractive Summarization0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.