SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 151200 of 309 papers

TitleStatusHype
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Enhancing Question Answering on Charts Through Effective Pre-training Tasks0
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications0
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use0
Notes on Applicability of GPT-4 to Document Understanding0
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingCode0
CREPE: Coordinate-Aware End-to-End Document Parser0
Machine Unlearning for Document ClassificationCode0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents0
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach0
HRVDA: High-Resolution Visual Document Assistant0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionCode0
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence0
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document UnderstandingCode0
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning0
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning0
LAPDoc: Layout-Aware Prompting for Documents0
Financial Report Chunking for Effective Retrieval Augmented GenerationCode0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents0
Long Context Compression with Activation BeaconCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
DocGraphLM: Documental Graph Language Model for Information Extraction0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table RecognitionCode0
On Scaling Up a Multilingual Vision and Language Model0
DocLLM: A layout-aware generative language model for multimodal document understanding0
SLJP: Semantic Extraction based Legal Judgment Prediction0
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding0
Efficient End-to-End Visual Document Understanding with Rationale Distillation0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency0
A Multi-Modal Multilingual Benchmark for Document Image Classification0
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks0
Finding Pragmatic Differences Between Disciplines0
Document Understanding for Healthcare Referrals0
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapCode0
KOSMOS-2.5: A Multimodal Literate Model0
Long-Range Transformer Architectures for Document UnderstandingCode0
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration0
Vision Grid Transformer for Document Layout AnalysisCode0
Workshop on Document Intelligence Understanding0
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.