SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 101150 of 309 papers

TitleStatusHype
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
Assessing Generative AI value in a public sector context: evidence from a field experiment0
DocMIA: Document-Level Membership Inference Attacks against DocVQA ModelsCode0
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in HanjaCode0
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations0
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends0
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark0
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models0
Memory-Augmented Agent Training for Business Document Understanding0
Learned Compression for Compressed LearningCode0
DocVLM: Make Your VLM an Efficient Reader0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications0
DOGE: Towards Versatile Visual Document Grounding and Referring0
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation0
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
Data-driven Coreference-based Ontology BuildingCode0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs0
Harnessing Webpage UIs for Text-Rich Visual Understanding0
ChuLo: Chunk-Level Key Information Representation for Long Document ProcessingCode0
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training0
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models0
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights0
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications0
DocMamba: Efficient Document Pre-training with State Space Model0
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T50
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural NetworkCode0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingCode0
ViRED: Prediction of Visual Relations in Engineering Drawings0
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts0
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding0
Building and better understanding vision-language models: insights and future directions0
Arctic-TILT. Business Document Understanding at Sub-Billion Scale0
Deep Learning based Visually Rich Document Content Understanding: A Survey0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review0
Token-level Correlation-guided Compression for Efficient Multimodal Document UnderstandingCode0
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition0
Hypergraph based Understanding for Document Semantic Entity RecognitionCode0
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming0
DrVideo: Document Retrieval Based Long Video Understanding0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.