document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 309 papers

Title	Date	Tasks	Status
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Feb 24, 2025	document understandingMultimodal Reasoning	—Unverified
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding	Feb 20, 2025	document understandingOptical Character Recognition	—Unverified
Assessing Generative AI value in a public sector context: evidence from a field experiment	Feb 13, 2025	document understanding	—Unverified
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models	Feb 6, 2025	document understandingInference Attack	CodeCode Available
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja	Jan 21, 2025	document understandingMachine Translation	CodeCode Available
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations	Jan 6, 2025	Document AIdocument understanding	—Unverified
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends	Jan 4, 2025	document understandingQuestion Answering	—Unverified
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark	Jan 1, 2025	document understandingImage Retrieval	—Unverified
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models	Dec 18, 2024	Document Classificationdocument-image-classification	—Unverified
Memory-Augmented Agent Training for Business Document Understanding	Dec 17, 2024	document understanding	—Unverified
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available
DocVLM: Make Your VLM an Efficient Reader	Dec 11, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks	Dec 5, 2024	Code Generationdocument understanding	—Unverified
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications	Nov 28, 2024	document understandingMathematical Reasoning	—Unverified
DOGE: Towards Versatile Visual Document Grounding and Referring	Nov 26, 2024	document understanding	—Unverified
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Nov 25, 2024	document understandingLanguage Modeling	—Unverified
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation	Nov 22, 2024	Anomaly Detectiondocument understanding	—Unverified
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Nov 12, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework	Nov 9, 2024	document understandingQuestion Answering	—Unverified
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Nov 8, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Nov 7, 2024	document understandingOptical Character Recognition	—Unverified
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection	Nov 5, 2024	document understanding	—Unverified
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding	Nov 2, 2024	document understandingQuestion Answering	—Unverified
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding	Oct 25, 2024	Benchmarkingdocument understanding	—Unverified
Data-driven Coreference-based Ontology Building	Oct 22, 2024	coreference-resolutionCoreference Resolution	CodeCode Available
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs	Oct 20, 2024	document understandingKey Information Extraction	—Unverified
Harnessing Webpage UIs for Text-Rich Visual Understanding	Oct 17, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
ChuLo: Chunk-Level Key Information Representation for Long Document Processing	Oct 14, 2024	ChunkingClassification	CodeCode Available
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training	Oct 14, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models	Oct 4, 2024	document understandingKnowledge Distillation	—Unverified
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights	Oct 2, 2024	document understandingDomain Adaptation	—Unverified
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications	Sep 27, 2024	DiversityDocument Summarization	—Unverified
DocMamba: Efficient Document Pre-training with State Space Model	Sep 18, 2024	document understanding	—Unverified
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5	Sep 17, 2024	document understandingTransfer Learning	—Unverified
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network	Sep 11, 2024	Document Layout Analysisdocument understanding	CodeCode Available
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	Sep 5, 2024	document understandingGPU	CodeCode Available
ViRED: Prediction of Visual Relations in Engineering Drawings	Sep 2, 2024	Decoderdocument understanding	—Unverified
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts	Aug 31, 2024	document understandingtoken-classification	—Unverified
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding	Aug 27, 2024	document understanding	—Unverified
Building and better understanding vision-language models: insights and future directions	Aug 22, 2024	document understanding	—Unverified
Arctic-TILT. Business Document Understanding at Sub-Billion Scale	Aug 8, 2024	document understandingGPU	—Unverified
Deep Learning based Visually Rich Document Content Understanding: A Survey	Aug 2, 2024	Deep Learningdocument understanding	—Unverified
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review	Jul 23, 2024	Deep Learningdocument understanding	—Unverified
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding	Jul 19, 2024	document understandingInformativeness	CodeCode Available
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition	Jul 16, 2024	Decoderdocument understanding	—Unverified
Hypergraph based Understanding for Document Semantic Entity Recognition	Jul 9, 2024	document understanding	CodeCode Available
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming	Jun 27, 2024	document understanding	—Unverified
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.