document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 309 papers

Title	Date	Tasks	Status	Hype
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark	Jan 1, 2025	document understandingImage Retrieval	—Unverified	0
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding	Jan 1, 2025	document understandingOptical Character Recognition (OCR)	CodeCode Available	1
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating	Dec 24, 2024	document understandingQuestion Answering	CodeCode Available	1
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models	Dec 18, 2024	document understandingImage Captioning	CodeCode Available	1
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models	Dec 18, 2024	Document Classificationdocument-image-classification	—Unverified	0
Memory-Augmented Agent Training for Business Document Understanding	Dec 17, 2024	document understanding	—Unverified	0
Learned Compression for Compressed Learning	Dec 12, 2024	Colorizationdocument understanding	CodeCode Available	0
DocVLM: Make Your VLM an Efficient Reader	Dec 11, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available	0
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks	Dec 5, 2024	Code Generationdocument understanding	—Unverified	0
MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications	Nov 28, 2024	document understandingMathematical Reasoning	—Unverified	0
DOGE: Towards Versatile Visual Document Grounding and Referring	Nov 26, 2024	document understanding	—Unverified	0
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training	Nov 25, 2024	document understandingLanguage Modeling	—Unverified	0
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation	Nov 22, 2024	Anomaly Detectiondocument understanding	—Unverified	0
Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction	Nov 19, 2024	document understandingOptical Character Recognition (OCR)	CodeCode Available	2
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Nov 12, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework	Nov 9, 2024	document understandingQuestion Answering	—Unverified	0
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Nov 8, 2024	document understandingOptical Character Recognition (OCR)	—Unverified	0
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding	Nov 7, 2024	document understandingOptical Character Recognition	—Unverified	0
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection	Nov 5, 2024	document understanding	—Unverified	0
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding	Nov 2, 2024	document understandingQuestion Answering	—Unverified	0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding	Oct 25, 2024	Benchmarkingdocument understanding	—Unverified	0
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1
Data-driven Coreference-based Ontology Building	Oct 22, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs	Oct 20, 2024	document understandingKey Information Extraction	—Unverified	0

Show:10 25 50

← PrevPage 3 of 13Next →

No leaderboard results yet.