| WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data | Dec 15, 2023 | document understandingQuestion Answering | CodeCode Available | 1 |
| Privacy-Aware Document Visual Question Answering | Dec 15, 2023 | document understandingFederated Learning | CodeCode Available | 1 |
| SLJP: Semantic Extraction based Legal Judgment Prediction | Dec 13, 2023 | document understandingPrediction | —Unverified | 0 |
| Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs | Nov 22, 2023 | document understandingInstruction Following | CodeCode Available | 1 |
| DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding | Nov 20, 2023 | document understandingLanguage Modeling | —Unverified | 0 |
| Efficient End-to-End Visual Document Understanding with Rationale Distillation | Nov 16, 2023 | document understandingImage to text | —Unverified | 0 |
| DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency | Nov 9, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| A Multi-Modal Multilingual Benchmark for Document Image Classification | Oct 25, 2023 | ClassificationCross-Lingual Transfer | —Unverified | 0 |
| DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading | Oct 23, 2023 | Document AIdocument understanding | CodeCode Available | 1 |
| DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond | Oct 19, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API | Oct 7, 2023 | Decoderdocument understanding | —Unverified | 0 |
| ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks | Oct 3, 2023 | document understandingIncremental Learning | —Unverified | 0 |
| Finding Pragmatic Differences Between Disciplines | Sep 30, 2023 | DiversityDocument Summarization | —Unverified | 0 |
| Document Understanding for Healthcare Referrals | Sep 22, 2023 | document understandingManagement | —Unverified | 0 |
| SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap | Sep 21, 2023 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| KOSMOS-2.5: A Multimodal Literate Model | Sep 20, 2023 | document understandingmodel | —Unverified | 0 |
| GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification | Sep 11, 2023 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| Long-Range Transformer Architectures for Document Understanding | Sep 11, 2023 | document understandingInformation Retrieval | CodeCode Available | 0 |
| Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration | Sep 3, 2023 | Decoderdocument understanding | —Unverified | 0 |
| Vision Grid Transformer for Document Layout Analysis | Aug 29, 2023 | Document AIDocument Layout Analysis | CodeCode Available | 0 |
| Enhancing Visually-Rich Document Understanding via Layout Structure Modeling | Aug 15, 2023 | document understanding | CodeCode Available | 1 |
| Workshop on Document Intelligence Understanding | Jul 31, 2023 | document understandingVisual Question Answering (VQA) | —Unverified | 0 |
| MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary | Jul 24, 2023 | document understandingOptical Character Recognition (OCR) | —Unverified | 0 |
| A Survey and Approach to Chart Classification | Jul 9, 2023 | Chart UnderstandingClassification | —Unverified | 0 |
| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | CodeCode Available | 0 |
| DocumentNet: Bridging the Data Gap in Document Pre-Training | Jun 15, 2023 | document understandingEntity Retrieval | —Unverified | 0 |
| DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents | Jun 9, 2023 | Contrastive Learningdocument understanding | CodeCode Available | 1 |
| Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models | Jun 5, 2023 | document understandingQuestion Answering | CodeCode Available | 0 |
| DocFormerv2: Local Features for Document Understanding | Jun 2, 2023 | Decoderdocument understanding | CodeCode Available | 1 |
| Table Detection for Visually Rich Document Images | May 30, 2023 | document understandingobject-detection | CodeCode Available | 0 |
| LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding | May 30, 2023 | document-image-classificationDocument Image Classification | —Unverified | 0 |
| PaLI-X: On Scaling up a Multilingual Vision and Language Model | May 29, 2023 | Chart Question Answeringdocument understanding | CodeCode Available | 1 |
| Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model | May 25, 2023 | ClusteringDocument Summarization | CodeCode Available | 0 |
| AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content | May 24, 2023 | Document Summarizationdocument understanding | —Unverified | 0 |
| Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models | May 24, 2023 | document understandingImage Captioning | CodeCode Available | 1 |
| DUBLIN -- Document Understanding By Language-Image Network | May 23, 2023 | Document Classificationdocument understanding | —Unverified | 0 |
| Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding | May 19, 2023 | document understanding | —Unverified | 0 |
| Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding | May 16, 2023 | Decoderdocument understanding | —Unverified | 0 |
| DLUE: Benchmarking Document Language Understanding | May 16, 2023 | BenchmarkingDocument Classification | —Unverified | 0 |
| M^6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | May 15, 2023 | ArticlesDocument Layout Analysis | CodeCode Available | 0 |
| Document Understanding Dataset and Evaluation (DUDE) | May 15, 2023 | Document AIdocument understanding | CodeCode Available | 1 |
| Two to Five Truths in Non-Negative Matrix Factorization | May 6, 2023 | Clusteringdocument understanding | —Unverified | 0 |
| Revisiting Table Detection Datasets for Visually Rich Documents | May 4, 2023 | document understandingobject-detection | —Unverified | 0 |
| FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction | May 4, 2023 | Contrastive Learningdocument understanding | —Unverified | 0 |
| LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation | May 3, 2023 | Data Visualizationdocument understanding | CodeCode Available | 1 |
| CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data | Apr 28, 2023 | document understandingLanguage Modeling | CodeCode Available | 1 |
| Information Redundancy and Biases in Public Document Information Extraction Benchmarks | Apr 28, 2023 | document understandingKey Information Extraction | CodeCode Available | 0 |
| What Makes a Good Dataset for Symbol Description Reading? | Apr 17, 2023 | document understandingMath | —Unverified | 0 |
| PDFVQA: A New Dataset for Real-World VQA on PDF Documents | Apr 13, 2023 | document understandingKey Information Extraction | —Unverified | 0 |
| Is ChatGPT A Good Keyphrase Generator? A Preliminary Study | Mar 23, 2023 | Diversitydocument understanding | CodeCode Available | 0 |