document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 309 papers

Title	Date	Tasks	Status
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report	Jun 17, 2024	document understanding	CodeCode Available
Enhancing Question Answering on Charts Through Effective Pre-training Tasks	Jun 14, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications	Jun 12, 2024	document-image-classificationDocument Image Classification	—Unverified
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use	May 30, 2024	document understandingKey Information Extraction	—Unverified
Notes on Applicability of GPT-4 to Document Understanding	May 28, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting	May 21, 2024	document-image-classificationDocument Image Classification	CodeCode Available
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding	May 6, 2024	Contrastive Learningdocument understanding	CodeCode Available
CREPE: Coordinate-Aware End-to-End Document Parser	May 1, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
Machine Unlearning for Document Classification	Apr 29, 2024	ClassificationDocument Classification	CodeCode Available
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism	Apr 29, 2024	document understandingGPU	CodeCode Available
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents	Apr 16, 2024	document understandingKey Information Extraction	—Unverified
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach	Apr 13, 2024	document understanding	—Unverified
HRVDA: High-Resolution Visual Document Assistant	Apr 10, 2024	document understanding	—Unverified
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding	Apr 8, 2024	Document AIdocument understanding	CodeCode Available
BuDDIE: A Business Document Dataset for Multi-task Information Extraction	Apr 5, 2024	Document Classificationdocument understanding	—Unverified
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition	Mar 28, 2024	Decoderdocument understanding	CodeCode Available
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence	Mar 27, 2024	Document AIdocument understanding	—Unverified
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding	Mar 21, 2024	document-image-classificationDocument Image Classification	—Unverified
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding	Mar 19, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models	Feb 29, 2024	Contrastive Learningdocument understanding	—Unverified
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding	Feb 28, 2024	document understandingForm	CodeCode Available
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning	Feb 26, 2024	Data Augmentationdocument understanding	—Unverified
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning	Feb 19, 2024	document understandingMedical Diagnosis	—Unverified
LAPDoc: Layout-Aware Prompting for Documents	Feb 15, 2024	document understandingKey Information Extraction	—Unverified
Financial Report Chunking for Effective Retrieval Augmented Generation	Feb 5, 2024	Chunkingdocument understanding	CodeCode Available
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents	Jan 26, 2024	4kDocument AI	—Unverified
Long Context Compression with Activation Beacon	Jan 7, 2024	4kdocument understanding	CodeCode Available
Multimodal weighted graph representation for information extraction from visually rich documents.	Jan 5, 2024	Document Layout Analysisdocument understanding	CodeCode Available
DocGraphLM: Documental Graph Language Model for Information Extraction	Jan 5, 2024	document understandingLanguage Modeling	—Unverified
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	CodeCode Available
On Scaling Up a Multilingual Vision and Language Model	Jan 1, 2024	document understandingIn-Context Learning	—Unverified
DocLLM: A layout-aware generative language model for multimodal document understanding	Dec 31, 2023	document understandingLanguage Modeling	—Unverified
SLJP: Semantic Extraction based Legal Judgment Prediction	Dec 13, 2023	document understandingPrediction	—Unverified
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding	Nov 20, 2023	document understandingLanguage Modeling	—Unverified
Efficient End-to-End Visual Document Understanding with Rationale Distillation	Nov 16, 2023	document understandingImage to text	—Unverified
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency	Nov 9, 2023	document understandingKey Information Extraction	—Unverified
A Multi-Modal Multilingual Benchmark for Document Image Classification	Oct 25, 2023	ClassificationCross-Lingual Transfer	—Unverified
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond	Oct 19, 2023	Document AIDocument Layout Analysis	CodeCode Available
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API	Oct 7, 2023	Decoderdocument understanding	—Unverified
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks	Oct 3, 2023	document understandingIncremental Learning	—Unverified
Finding Pragmatic Differences Between Disciplines	Sep 30, 2023	DiversityDocument Summarization	—Unverified
Document Understanding for Healthcare Referrals	Sep 22, 2023	document understandingManagement	—Unverified
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap	Sep 21, 2023	Contrastive Learningdocument understanding	CodeCode Available
KOSMOS-2.5: A Multimodal Literate Model	Sep 20, 2023	document understandingmodel	—Unverified
Long-Range Transformer Architectures for Document Understanding	Sep 11, 2023	document understandingInformation Retrieval	CodeCode Available
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification	Sep 11, 2023	document-image-classificationDocument Image Classification	—Unverified
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration	Sep 3, 2023	Decoderdocument understanding	—Unverified
Vision Grid Transformer for Document Layout Analysis	Aug 29, 2023	Document AIDocument Layout Analysis	CodeCode Available
Workshop on Document Intelligence Understanding	Jul 31, 2023	document understandingVisual Question Answering (VQA)	—Unverified
MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary	Jul 24, 2023	document understandingOptical Character Recognition (OCR)	—Unverified

Show:10 25 50

← PrevPage 4 of 7Next →

No leaderboard results yet.