SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 151200 of 309 papers

TitleStatusHype
DrVideo: Document Retrieval Based Long Video Understanding0
DUBLIN -- Document Understanding By Language-Image Network0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
Efficient layout-aware pretraining for multimodal form understanding0
Enhancing Question Answering on Charts Through Effective Pre-training Tasks0
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models0
Enumeration of Extractive Oracle Summaries0
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding0
Extract with Order for Coherent Multi-Document Summarization0
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding0
Finding Pragmatic Differences Between Disciplines0
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction0
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T50
Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation0
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents0
LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications0
MATrIX -- Modality-Aware Transformer for Information eXtraction0
Memory-Augmented Agent Training for Business Document Understanding0
Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning0
Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web0
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Notes on Applicability of GPT-4 to Document Understanding0
Object-oriented Neural Programming (OONP) for Document Understanding0
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text0
On Scaling Up a Multilingual Vision and Language Model0
OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis0
PDFVQA: A New Dataset for Real-World VQA on PDF Documents0
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning0
Position Masking for Improved Layout-Aware Document Understanding0
Probing Position-Aware Attention Mechanism in Long Document Understanding0
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks0
PSG: Prompt-based Sequence Generation for Acronym Extraction0
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding0
QueryForm: A Simple Zero-shot Form Entity Query Framework0
RDU: A Region-based Approach to Form-style Document Understanding0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training0
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use0
Revisiting Table Detection Datasets for Visually Rich Documents0
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning0
Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.