SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 251300 of 309 papers

TitleStatusHype
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding0
Finding Pragmatic Differences Between Disciplines0
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction0
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction0
Friendly Topic Assistant for Transformer Based Abstractive Summarization0
From Entity Linking to Question Answering -- Recent Progress on Semantic Grounding Tasks0
Génération de question à partir d’analyse sémantique pour l’adaptation non supervisée de modèles de compréhension de documents (Question generation from semantic analysis for unsupervised adaptation of document understanding models)0
Graph Convolution for Multimodal Information Extraction from Visually Rich Documents0
Handling tree-structured text: parsing directory pages0
Harnessing Webpage UIs for Text-Rich Visual Understanding0
Hierarchical BERT for Medical Document Understanding0
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding0
How does Watermarking Affect Visual Language Models in Document Understanding?0
HRVDA: High-Resolution Visual Document Assistant0
Improving Applicability of Deep Learning based Token Classification models during Training0
Improving Keyphrase Extraction with Data Augmentation and Information Filtering0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation0
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding0
Joint Structured Learning and Predictions under Logical Constraints in Conditional Random Fields0
KeyVec: Key-semantics Preserving Document Representations0
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding0
KOSMOS-2.5: A Multimodal Literate Model0
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding0
LAPDoc: Layout-Aware Prompting for Documents0
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding0
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding0
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T50
Leveraging Domain Agnostic and Specific Knowledge for Acronym Disambiguation0
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents0
LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding0
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications0
MATrIX -- Modality-Aware Transformer for Information eXtraction0
Memory-Augmented Agent Training for Business Document Understanding0
Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding0
MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning0
Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web0
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition0
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding0
Notes on Applicability of GPT-4 to Document Understanding0
Object-oriented Neural Programming (OONP) for Document Understanding0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.