| Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Feb 24, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Feb 22, 2025 | document understandingKey Information Extraction | CodeCode Available | 0 |
| KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding | Feb 20, 2025 | document understandingOptical Character Recognition | —Unverified | 0 |
| Assessing Generative AI value in a public sector context: evidence from a field experiment | Feb 13, 2025 | document understanding | —Unverified | 0 |
| DocMIA: Document-Level Membership Inference Attacks against DocVQA Models | Feb 6, 2025 | document understandingInference Attack | CodeCode Available | 0 |
| HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja | Jan 21, 2025 | document understandingMachine Translation | CodeCode Available | 0 |
| BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations | Jan 6, 2025 | Document AIdocument understanding | —Unverified | 0 |
| Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends | Jan 4, 2025 | document understandingQuestion Answering | —Unverified | 0 |
| Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark | Jan 1, 2025 | document understandingImage Retrieval | —Unverified | 0 |
| Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models | Dec 18, 2024 | Document Classificationdocument-image-classification | —Unverified | 0 |
| Memory-Augmented Agent Training for Business Document Understanding | Dec 17, 2024 | document understanding | —Unverified | 0 |
| Learned Compression for Compressed Learning | Dec 12, 2024 | Colorizationdocument understanding | CodeCode Available | 0 |
| DocVLM: Make Your VLM an Efficient Reader | Dec 11, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | CodeCode Available | 0 |
| BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks | Dec 5, 2024 | Code Generationdocument understanding | —Unverified | 0 |
| MATATA: Weakly Supervised End-to-End MAthematical Tool-Augmented Reasoning for Tabular Applications | Nov 28, 2024 | document understandingMathematical Reasoning | —Unverified | 0 |
| DOGE: Towards Versatile Visual Document Grounding and Referring | Nov 26, 2024 | document understanding | —Unverified | 0 |
| StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training | Nov 25, 2024 | document understandingLanguage Modeling | —Unverified | 0 |
| Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation | Nov 22, 2024 | Anomaly Detectiondocument understanding | —Unverified | 0 |
| Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding | Nov 12, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework | Nov 9, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding | Nov 8, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Nov 7, 2024 | document understandingOptical Character Recognition | —Unverified | 0 |
| TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection | Nov 5, 2024 | document understanding | —Unverified | 0 |
| LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | Nov 2, 2024 | document understandingQuestion Answering | —Unverified | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 |
| Data-driven Coreference-based Ontology Building | Oct 22, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| "What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs | Oct 20, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Harnessing Webpage UIs for Text-Rich Visual Understanding | Oct 17, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| ChuLo: Chunk-Level Key Information Representation for Long Document Processing | Oct 14, 2024 | ChunkingClassification | CodeCode Available | 0 |
| ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training | Oct 14, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models | Oct 4, 2024 | document understandingKnowledge Distillation | —Unverified | 0 |
| DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights | Oct 2, 2024 | document understandingDomain Adaptation | —Unverified | 0 |
| Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications | Sep 27, 2024 | DiversityDocument Summarization | —Unverified | 0 |
| DocMamba: Efficient Document Pre-training with State Space Model | Sep 18, 2024 | document understanding | —Unverified | 0 |
| Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Sep 17, 2024 | document understandingTransfer Learning | —Unverified | 0 |
| Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network | Sep 11, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding | Sep 5, 2024 | document understandingGPU | CodeCode Available | 0 |
| ViRED: Prediction of Visual Relations in Engineering Drawings | Sep 2, 2024 | Decoderdocument understanding | —Unverified | 0 |
| The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts | Aug 31, 2024 | document understandingtoken-classification | —Unverified | 0 |
| SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding | Aug 27, 2024 | document understanding | —Unverified | 0 |
| Building and better understanding vision-language models: insights and future directions | Aug 22, 2024 | document understanding | —Unverified | 0 |
| Arctic-TILT. Business Document Understanding at Sub-Billion Scale | Aug 8, 2024 | document understandingGPU | —Unverified | 0 |
| Deep Learning based Visually Rich Document Content Understanding: A Survey | Aug 2, 2024 | Deep Learningdocument understanding | —Unverified | 0 |
| Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review | Jul 23, 2024 | Deep Learningdocument understanding | —Unverified | 0 |
| Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Jul 19, 2024 | document understandingInformativeness | CodeCode Available | 0 |
| NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition | Jul 16, 2024 | Decoderdocument understanding | —Unverified | 0 |
| Hypergraph based Understanding for Document Semantic Entity Recognition | Jul 9, 2024 | document understanding | CodeCode Available | 0 |
| DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming | Jun 27, 2024 | document understanding | —Unverified | 0 |
| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 |