document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 309 papers

Title	Date	Tasks	Status
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI	Feb 24, 2025	document understandingMultimodal Reasoning	—Unverified
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding	Feb 20, 2025	document understandingOptical Character Recognition	—Unverified
Assessing Generative AI value in a public sector context: evidence from a field experiment	Feb 13, 2025	document understanding	—Unverified
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models	Feb 6, 2025	document understandingInference Attack	CodeCode Available
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja	Jan 21, 2025	document understandingMachine Translation	CodeCode Available
BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations	Jan 6, 2025	Document AIdocument understanding	—Unverified
Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends	Jan 4, 2025	document understandingQuestion Answering	—Unverified
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark	Jan 1, 2025	document understandingImage Retrieval	—Unverified
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models	Dec 18, 2024	Document Classificationdocument-image-classification	—Unverified
Memory-Augmented Agent Training for Business Document Understanding	Dec 17, 2024	document understanding	—Unverified
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available
DocVLM: Make Your VLM an Efficient Reader	Dec 11, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks	Dec 5, 2024	Code Generationdocument understanding	—Unverified
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications	Nov 28, 2024	document understandingMathematical Reasoning	—Unverified
DOGE: Towards Versatile Visual Document Grounding and Referring	Nov 26, 2024	document understanding	—Unverified
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Nov 25, 2024	document understandingLanguage Modeling	—Unverified
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation	Nov 22, 2024	Anomaly Detectiondocument understanding	—Unverified
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Nov 12, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework	Nov 9, 2024	document understandingQuestion Answering	—Unverified
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Nov 8, 2024	document understandingOptical Character Recognition (OCR)	—Unverified
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Nov 7, 2024	document understandingOptical Character Recognition	—Unverified
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection	Nov 5, 2024	document understanding	—Unverified
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding	Nov 2, 2024	document understandingQuestion Answering	—Unverified

Show:10 25 50

← PrevPage 5 of 13Next →

No leaderboard results yet.