SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 251300 of 309 papers

TitleStatusHype
Deeper Clinical Document Understanding Using Relation ExtractionCode0
DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question AnsweringCode0
Relation-Rich Visual Document Generator for Visual Information ExtractionCode0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout AnalysisCode0
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural NetworkCode0
Machine Unlearning for Document ClassificationCode0
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document UnderstandingCode0
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document UnderstandingCode0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
XDoc: Unified Pre-training for Cross-Format Document UnderstandingCode0
Zero-Shot Complex Question-Answering on Long Scientific DocumentsCode0
Matching Article Pairs with Graphical Decomposition and ConvolutionsCode0
ChuLo: Chunk-Level Key Information Representation for Long Document ProcessingCode0
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document ParsingCode0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
Improving Clinical Document Understanding on COVID-19 Research with Spark NLPCode0
Token-level Correlation-guided Compression for Efficient Multimodal Document UnderstandingCode0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionCode0
Message Passing Attention Networks for Document UnderstandingCode0
Chargrid: Towards Understanding 2D DocumentsCode0
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapCode0
Hypergraph based Understanding for Document Semantic Entity RecognitionCode0
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in HanjaCode0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document UnderstandingCode0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingCode0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
A Survey of Deep Learning Approaches for OCR and Document UnderstandingCode0
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
DavarOCR: A Toolbox for OCR and Multi-Modal Document UnderstandingCode0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingCode0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Multimodal Tree Decoder for Table of Contents Extraction in Document ImagesCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
SFDLA: Source-Free Document Layout AnalysisCode0
Blockwise Self-Attention for Long Document UnderstandingCode0
Data-driven Coreference-based Ontology BuildingCode0
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
Financial Report Chunking for Effective Retrieval Augmented GenerationCode0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionCode0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table RecognitionCode0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
KRED: Knowledge-Aware Document Representation for News RecommendationsCode0
Skim-Attention: Learning to Focus via Document LayoutCode0
Vision Grid Transformer for Document Layout AnalysisCode0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
Long Context Compression with Activation BeaconCode0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
PaddleOCR 3.0 Technical ReportCode0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.