SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 101125 of 309 papers

TitleStatusHype
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Assessing Generative AI value in a public sector context: evidence from a field experiment0
DocMIA: Document-Level Membership Inference Attacks against DocVQA ModelsCode0
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in HanjaCode0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations0
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends0
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark0
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models0
Memory-Augmented Agent Training for Business Document Understanding0
Learned Compression for Compressed LearningCode0
DocVLM: Make Your VLM an Efficient Reader0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications0
DOGE: Towards Versatile Visual Document Grounding and Referring0
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation0
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding0
Show:102550
← PrevPage 5 of 13Next →

No leaderboard results yet.