document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 309 papers

Title	Date	Tasks	Status	Hype
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding	Jul 19, 2024	document understandingInformativeness	CodeCode Available	0
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding	Jul 17, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition	Jul 16, 2024	Decoderdocument understanding	—Unverified	0
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents	Jul 12, 2024	Document Layout Analysisdocument understanding	CodeCode Available	1
Hypergraph based Understanding for Document Semantic Entity Recognition	Jul 9, 2024	document understanding	CodeCode Available	0
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding	Jul 2, 2024	document understandingKey Information Extraction	CodeCode Available	2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations	Jul 1, 2024	Benchmarkingdocument understanding	CodeCode Available	2
ColPali: Efficient Document Retrieval with Vision Language Models	Jun 27, 2024	document understandingRAG	CodeCode Available	7
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming	Jun 27, 2024	document understanding	—Unverified	0
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified	0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report	Jun 17, 2024	document understanding	CodeCode Available	0
Enhancing Question Answering on Charts Through Effective Pre-training Tasks	Jun 14, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications	Jun 12, 2024	document-image-classificationDocument Image Classification	—Unverified	0
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning	Jun 4, 2024	document understandingGPU	CodeCode Available	1
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use	May 30, 2024	document understandingKey Information Extraction	—Unverified	0
Notes on Applicability of GPT-4 to Document Understanding	May 28, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
Focus Anywhere for Fine-grained Multi-page Document Understanding	May 23, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	5
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting	May 21, 2024	document-image-classificationDocument Image Classification	CodeCode Available	0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding	May 6, 2024	Contrastive Learningdocument understanding	CodeCode Available	0
CREPE: Coordinate-Aware End-to-End Document Parser	May 1, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism	Apr 29, 2024	document understandingGPU	CodeCode Available	0
Machine Unlearning for Document Classification	Apr 29, 2024	ClassificationDocument Classification	CodeCode Available	0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents	Apr 16, 2024	document understandingKey Information Extraction	—Unverified	0
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach	Apr 13, 2024	document understanding	—Unverified	0
HRVDA: High-Resolution Visual Document Assistant	Apr 10, 2024	document understanding	—Unverified	0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding	Apr 8, 2024	Document AIdocument understanding	CodeCode Available	0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction	Apr 5, 2024	Document Classificationdocument understanding	—Unverified	0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition	Mar 28, 2024	Decoderdocument understanding	—Unverified	0
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence	Mar 27, 2024	Document AIdocument understanding	—Unverified	0
Visually Guided Generative Text-Layout Pre-training for Document Intelligence	Mar 25, 2024	Document Classificationdocument understanding	CodeCode Available	2
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding	Mar 21, 2024	document-image-classificationDocument Image Classification	—Unverified	0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding	Mar 19, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	Mar 7, 2024	document understandingKey Information Extraction	CodeCode Available	5
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models	Feb 29, 2024	Contrastive Learningdocument understanding	—Unverified	0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding	Feb 28, 2024	document understandingForm	CodeCode Available	0
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding	Feb 28, 2024	document understandingInformation Retrieval	CodeCode Available	1
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning	Feb 26, 2024	Data Augmentationdocument understanding	—Unverified	0
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning	Feb 19, 2024	document understandingMedical Diagnosis	—Unverified	0
LAPDoc: Layout-Aware Prompting for Documents	Feb 15, 2024	document understandingKey Information Extraction	—Unverified	0
Financial Report Chunking for Effective Retrieval Augmented Generation	Feb 5, 2024	Chunkingdocument understanding	CodeCode Available	0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents	Jan 26, 2024	4kDocument AI	—Unverified	0
On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling	Jan 25, 2024	DecoderDiversity	CodeCode Available	1
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions	Jan 24, 2024	document understandingQuestion Answering	CodeCode Available	2
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning	Jan 12, 2024	Diversitydocument understanding	CodeCode Available	3
Long Context Compression with Activation Beacon	Jan 7, 2024	4kdocument understanding	—Unverified	0
Multimodal weighted graph representation for information extraction from visually rich documents.	Jan 5, 2024	Document Layout Analysisdocument understanding	CodeCode Available	0
DocGraphLM: Documental Graph Language Model for Information Extraction	Jan 5, 2024	document understandingLanguage Modeling	—Unverified	0
On Scaling Up a Multilingual Vision and Language Model	Jan 1, 2024	document understandingIn-Context Learning	—Unverified	0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	—Unverified	0
DocLLM: A layout-aware generative language model for multimodal document understanding	Dec 31, 2023	document understandingLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.