SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 101125 of 309 papers

TitleStatusHype
Token-level Correlation-guided Compression for Efficient Multimodal Document UnderstandingCode0
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document UnderstandingCode1
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition0
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documentsCode1
Hypergraph based Understanding for Document Semantic Entity RecognitionCode0
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document UnderstandingCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
ColPali: Efficient Document Retrieval with Vision Language ModelsCode7
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming0
DrVideo: Document Retrieval Based Long Video Understanding0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Enhancing Question Answering on Charts Through Effective Pre-training Tasks0
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications0
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningCode1
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use0
Notes on Applicability of GPT-4 to Document Understanding0
Focus Anywhere for Fine-grained Multi-page Document UnderstandingCode5
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingCode0
CREPE: Coordinate-Aware End-to-End Document Parser0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Machine Unlearning for Document ClassificationCode0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents0
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach0
HRVDA: High-Resolution Visual Document Assistant0
Show:102550
← PrevPage 5 of 13Next →

No leaderboard results yet.