SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 5175 of 309 papers

TitleStatusHype
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark0
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingCode1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and LocatingCode1
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language ModelsCode1
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models0
Memory-Augmented Agent Training for Business Document Understanding0
Learned Compression for Compressed LearningCode0
DocVLM: Make Your VLM an Efficient Reader0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications0
DOGE: Towards Versatile Visual Document Grounding and Referring0
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation0
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown ExtractionCode2
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
Data-driven Coreference-based Ontology BuildingCode0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs0
Show:102550
← PrevPage 3 of 13Next →

No leaderboard results yet.