| Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | Jun 17, 2024 | document understanding | CodeCode Available | 0 |
| Enhancing Question Answering on Charts Through Effective Pre-training Tasks | Jun 14, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| DistilDoc: Knowledge Distillation for Visually-Rich Document Applications | Jun 12, 2024 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | May 30, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Notes on Applicability of GPT-4 to Document Understanding | May 28, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | May 21, 2024 | document-image-classificationDocument Image Classification | CodeCode Available | 0 |
| GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | May 6, 2024 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| CREPE: Coordinate-Aware End-to-End Document Parser | May 1, 2024 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| Machine Unlearning for Document Classification | Apr 29, 2024 | ClassificationDocument Classification | CodeCode Available | 0 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |
| A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents | Apr 16, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach | Apr 13, 2024 | document understanding | —Unverified | 0 |
| HRVDA: High-Resolution Visual Document Assistant | Apr 10, 2024 | document understanding | —Unverified | 0 |
| LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Apr 8, 2024 | Document AIdocument understanding | CodeCode Available | 0 |
| BuDDIE: A Business Document Dataset for Multi-task Information Extraction | Apr 5, 2024 | Document Classificationdocument understanding | —Unverified | 0 |
| OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition | Mar 28, 2024 | Decoderdocument understanding | CodeCode Available | 0 |
| Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence | Mar 27, 2024 | Document AIdocument understanding | —Unverified | 0 |
| LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding | Mar 21, 2024 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Mar 19, 2024 | document understandingOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models | Feb 29, 2024 | Contrastive Learningdocument understanding | —Unverified | 0 |
| 3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding | Feb 28, 2024 | document understandingForm | CodeCode Available | 0 |
| Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning | Feb 26, 2024 | Data Augmentationdocument understanding | —Unverified | 0 |
| RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning | Feb 19, 2024 | document understandingMedical Diagnosis | —Unverified | 0 |
| LAPDoc: Layout-Aware Prompting for Documents | Feb 15, 2024 | document understandingKey Information Extraction | —Unverified | 0 |
| Financial Report Chunking for Effective Retrieval Augmented Generation | Feb 5, 2024 | Chunkingdocument understanding | CodeCode Available | 0 |
| LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents | Jan 26, 2024 | 4kDocument AI | —Unverified | 0 |
| Long Context Compression with Activation Beacon | Jan 7, 2024 | 4kdocument understanding | CodeCode Available | 0 |
| Multimodal weighted graph representation for information extraction from visually rich documents. | Jan 5, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 0 |
| DocGraphLM: Documental Graph Language Model for Information Extraction | Jan 5, 2024 | document understandingLanguage Modeling | —Unverified | 0 |
| OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition | Jan 1, 2024 | Decoderdocument understanding | CodeCode Available | 0 |
| On Scaling Up a Multilingual Vision and Language Model | Jan 1, 2024 | document understandingIn-Context Learning | —Unverified | 0 |
| DocLLM: A layout-aware generative language model for multimodal document understanding | Dec 31, 2023 | document understandingLanguage Modeling | —Unverified | 0 |
| SLJP: Semantic Extraction based Legal Judgment Prediction | Dec 13, 2023 | document understandingPrediction | —Unverified | 0 |
| DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding | Nov 20, 2023 | document understandingLanguage Modeling | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency | Nov 9, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| A Multi-Modal Multilingual Benchmark for Document Image Classification | Oct 25, 2023 | ClassificationCross-Lingual Transfer | —Unverified | 0 |
| DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond | Oct 19, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API | Oct 7, 2023 | Decoderdocument understanding | —Unverified | 0 |
| ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks | Oct 3, 2023 | document understandingIncremental Learning | —Unverified | 0 |
| Finding Pragmatic Differences Between Disciplines | Sep 30, 2023 | DiversityDocument Summarization | —Unverified | 0 |
| Document Understanding for Healthcare Referrals | Sep 22, 2023 | document understandingManagement | —Unverified | 0 |
| SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap | Sep 21, 2023 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| KOSMOS-2.5: A Multimodal Literate Model | Sep 20, 2023 | document understandingmodel | —Unverified | 0 |
| Long-Range Transformer Architectures for Document Understanding | Sep 11, 2023 | document understandingInformation Retrieval | CodeCode Available | 0 |
| GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification | Sep 11, 2023 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration | Sep 3, 2023 | Decoderdocument understanding | —Unverified | 0 |
| Vision Grid Transformer for Document Layout Analysis | Aug 29, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Workshop on Document Intelligence Understanding | Jul 31, 2023 | document understandingVisual Question Answering (VQA) | —Unverified | 0 |
| MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary | Jul 24, 2023 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |