| Docopilot: Improving Multimodal Models for Document-Level Understanding | Jan 1, 2025 | document understandingRAG | CodeCode Available | 1 |
| DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Jan 1, 2025 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating | Dec 24, 2024 | document understandingQuestion Answering | CodeCode Available | 1 |
| Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models | Dec 18, 2024 | Document Classificationdocument-image-classification | —Unverified | 0 |
| Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models | Dec 18, 2024 | document understandingImage Captioning | CodeCode Available | 1 |
| Memory-Augmented Agent Training for Business Document Understanding | Dec 17, 2024 | document understanding | —Unverified | 0 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 |
| DocVLM: Make Your VLM an Efficient Reader | Dec 11, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | CodeCode Available | 0 |
| BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks | Dec 5, 2024 | Code Generationdocument understanding | —Unverified | 0 |
| MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications | Nov 28, 2024 | document understandingMathematical Reasoning | —Unverified | 0 |
| DOGE: Towards Versatile Visual Document Grounding and Referring | Nov 26, 2024 | document understanding | —Unverified | 0 |
| StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training | Nov 25, 2024 | document understandingLanguage Modeling | —Unverified | 0 |
| Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation | Nov 22, 2024 | Anomaly Detectiondocument understanding | —Unverified | 0 |
| Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction | Nov 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding | Nov 12, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework | Nov 9, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding | Nov 8, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 |
| TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection | Nov 5, 2024 | document understanding | —Unverified | 0 |
| LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | Nov 2, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 |
| CAMEL-Bench: A Comprehensive Arabic LMM Benchmark | Oct 24, 2024 | document understandingVideo Understanding | CodeCode Available | 1 |
| Data-driven Coreference-based Ontology Building | Oct 22, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| "What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs | Oct 20, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Harnessing Webpage UIs for Text-Rich Visual Understanding | Oct 17, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception | Oct 16, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 9 |
| ChuLo: Chunk-Level Key Information Representation for Long Document Processing | Oct 14, 2024 | ChunkingClassification | CodeCode Available | 0 |
| ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training | Oct 14, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| LLMMapReduce: Simplified Long-Sequence Processing using Large Language Models | Oct 12, 2024 | document understanding | CodeCode Available | 4 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models | Oct 4, 2024 | document understandingKnowledge Distillation | —Unverified | 0 |
| DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights | Oct 2, 2024 | document understandingDomain Adaptation | —Unverified | 0 |
| Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding | Sep 29, 2024 | document understandingEntity Linking | CodeCode Available | 1 |
| Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications | Sep 27, 2024 | DiversityDocument Summarization | —Unverified | 0 |
| Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | Sep 19, 2024 | document understandingVideo Question Answering | CodeCode Available | 3 |
| DocMamba: Efficient Document Pre-training with State Space Model | Sep 18, 2024 | document understanding | —Unverified | 0 |
| Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Sep 17, 2024 | document understandingTransfer Learning | —Unverified | 0 |
| One missing piece in Vision and Language: A Survey on Comics Understanding | Sep 14, 2024 | document understandingimage-classification | CodeCode Available | 2 |
| Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network | Sep 11, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | CodeCode Available | 0 |
| ViRED: Prediction of Visual Relations in Engineering Drawings | Sep 2, 2024 | Decoderdocument understanding | —Unverified | 0 |
| The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts | Aug 31, 2024 | document understandingtoken-classification | —Unverified | 0 |
| DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Aug 27, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 1 |
| SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding | Aug 27, 2024 | document understanding | —Unverified | 0 |
| Building and better understanding vision-language models: insights and future directions | Aug 22, 2024 | document understanding | —Unverified | 0 |
| Arctic-TILT. Business Document Understanding at Sub-Billion Scale | Aug 8, 2024 | document understandingGPU | —Unverified | 0 |
| Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid | Aug 4, 2024 | document understanding | CodeCode Available | 5 |
| Deep Learning based Visually Rich Document Content Understanding: A Survey | Aug 2, 2024 | Deep Learningdocument understanding | —Unverified | 0 |
| Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | Jul 23, 2024 | Deep Learningdocument understanding | —Unverified | 0 |