SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 76100 of 309 papers

TitleStatusHype
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning0
A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions0
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document ParsingCode0
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning0
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning0
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning0
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting0
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?0
Document Image Rectification Bases on Self-Adaptive Multitask Fusion0
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer0
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language ModelsCode0
Relation-Rich Visual Document Generator for Visual Information ExtractionCode0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding0
How does Watermarking Affect Visual Language Models in Document Understanding?0
Improving Applicability of Deep Learning based Token Classification models during Training0
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?Code0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionCode0
SFDLA: Source-Free Document Layout AnalysisCode0
A Simple yet Effective Layout Token in Large Language Models for Document Understanding0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of TricksCode0
A Token-level Text Image Foundation Model for Document Understanding0
Zero-Shot Complex Question-Answering on Long Scientific DocumentsCode0
Show:102550
← PrevPage 4 of 13Next →

No leaderboard results yet.