document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 309 papers

Title	Date	Tasks	Status	Hype
Docopilot: Improving Multimodal Models for Document-Level Understanding	Jan 1, 2025	document understandingRAG	CodeCode Available	1
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Jan 1, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Dec 24, 2024	document understandingQuestion Answering	CodeCode Available	1
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models	Dec 18, 2024	Document Classificationdocument-image-classification	—Unverified	0
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models	Dec 18, 2024	document understandingImage Captioning	CodeCode Available	1
Memory-Augmented Agent Training for Business Document Understanding	Dec 17, 2024	document understanding	—Unverified	0
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available	0
DocVLM: Make Your VLM an Efficient Reader	Dec 11, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	—Unverified	0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks	Dec 5, 2024	Code Generationdocument understanding	—Unverified	0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications	Nov 28, 2024	document understandingMathematical Reasoning	—Unverified	0
DOGE: Towards Versatile Visual Document Grounding and Referring	Nov 26, 2024	document understanding	—Unverified	0
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Nov 25, 2024	document understandingLanguage Modeling	—Unverified	0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation	Nov 22, 2024	Anomaly Detectiondocument understanding	—Unverified	0
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction	Nov 19, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	2
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Nov 12, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework	Nov 9, 2024	document understandingQuestion Answering	—Unverified	0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Nov 8, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Nov 7, 2024	document understandingOptical Character Recognition	—Unverified	0
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection	Nov 5, 2024	document understanding	—Unverified	0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding	Nov 2, 2024	document understandingQuestion Answering	—Unverified	0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding	Oct 25, 2024	Benchmarkingdocument understanding	—Unverified	0
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1
Data-driven Coreference-based Ontology Building	Oct 22, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs	Oct 20, 2024	document understandingKey Information Extraction	—Unverified	0
Harnessing Webpage UIs for Text-Rich Visual Understanding	Oct 17, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception	Oct 16, 2024	Document Layout Analysisdocument understanding	CodeCode Available	9
ChuLo: Chunk-Level Key Information Representation for Long Document Processing	Oct 14, 2024	ChunkingClassification	CodeCode Available	0
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training	Oct 14, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
LLMMapReduce: Simplified Long-Sequence Processing using Large Language Models	Oct 12, 2024	document understanding	CodeCode Available	4
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models	Oct 4, 2024	document understandingKnowledge Distillation	—Unverified	0
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights	Oct 2, 2024	document understandingDomain Adaptation	—Unverified	0
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding	Sep 29, 2024	document understandingEntity Linking	CodeCode Available	1
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications	Sep 27, 2024	DiversityDocument Summarization	—Unverified	0
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution	Sep 19, 2024	document understandingVideo Question Answering	CodeCode Available	3
DocMamba: Efficient Document Pre-training with State Space Model	Sep 18, 2024	document understanding	—Unverified	0
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5	Sep 17, 2024	document understandingTransfer Learning	—Unverified	0
One missing piece in Vision and Language: A Survey on Comics Understanding	Sep 14, 2024	document understandingimage-classification	CodeCode Available	2
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network	Sep 11, 2024	Document Layout Analysisdocument understanding	CodeCode Available	0
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	Sep 5, 2024	document understandingGPU	—Unverified	0
ViRED: Prediction of Visual Relations in Engineering Drawings	Sep 2, 2024	Decoderdocument understanding	—Unverified	0
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts	Aug 31, 2024	document understandingtoken-classification	—Unverified	0
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Aug 27, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding	Aug 27, 2024	document understanding	—Unverified	0
Building and better understanding vision-language models: insights and future directions	Aug 22, 2024	document understanding	—Unverified	0
Arctic-TILT. Business Document Understanding at Sub-Billion Scale	Aug 8, 2024	document understandingGPU	—Unverified	0
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid	Aug 4, 2024	document understanding	CodeCode Available	5
Deep Learning based Visually Rich Document Content Understanding: A Survey	Aug 2, 2024	Deep Learningdocument understanding	—Unverified	0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review	Jul 23, 2024	Deep Learningdocument understanding	—Unverified	0

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.