SOTAVerified

Key Information Extraction

Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding.

Papers

Showing 150 of 74 papers

TitleStatusHype
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document UnderstandingCode2
OCRBench: On the Hidden Mystery of OCR in Large Multimodal ModelsCode2
LayoutLM: Pre-training of Text and Layout for Document Image UnderstandingCode2
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document UnderstandingCode2
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingCode1
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth EvaluationCode1
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingCode1
Form-NLU: Dataset for the Form Natural Language UnderstandingCode1
DocILE Benchmark for Document Information Localization and ExtractionCode1
Key Information Extraction From Documents: Evaluation And GeneratorCode1
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path PredictionCode1
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from DocumentsCode1
KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business DocumentsCode1
GenKIE: Robust Generative Multimodal Document Key Information ExtractionCode1
LAMBERT: Layout-Aware (Language) Modeling for information extractionCode1
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural NetworksCode1
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair ExtractionCode1
PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional NetworksCode1
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form ParserCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised NetworkCode0
Class-Agnostic Region-of-Interest Matching in Document ImagesCode0
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-LoopCode0
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural NetworkCode0
Information Redundancy and Biases in Public Document Information Extraction BenchmarksCode0
Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and DissertationsCode0
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and ClassificationCode0
Different Tastes of Entities: Investigating Human Label Variation in Named Entity AnnotationsCode0
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document UnderstandingCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
RDU: A Region-based Approach to Form-style Document Understanding0
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction0
Relational Representation Learning in Visually-Rich Documents0
Comparison of biomedical relationship extraction methods and models for knowledge graph creation0
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use0
SIMARA: a database for key-value information extraction from full pages0
Spatial Dual-Modality Graph Reasoning for Key Information Extraction0
UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents0
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents0
ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents0
VKIE: The Application of Key Information Extraction on Video Text0
"What is the value of templates?" Rethinking Document Information Extraction Datasets for LLMs0
See then Tell: Enhancing Key Information Extraction with Vision Grounding0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents0
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy0
Construction of a Syntactic Analysis Map for Yi Shui School through Text Mining and Natural Language Processing Research0
Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization0
Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RORE (GeoLayoutLM)F198.52Unverified
2GeoLayoutLMF197.97Unverified
3LayoutLMv3 LargeF197.46Unverified
4LayoutMask (large)F197.19Unverified
5LayoutMask (base)F196.99Unverified
6TPP (LayoutMask)F196.92Unverified
7LILTF196.07Unverified
8LayoutLMv2LARGEF196.01Unverified
9LayoutLMv2BASEF194.95Unverified
#ModelMetricClaimedVerifiedStatus
1LayoutLMv2LARGE (Excluding OCR mismatch)F197.81Unverified
2RORE (GeoLayoutLM)F196.97Unverified
3LayoutLMv2LARGEF196.61Unverified
4LayoutLMv2BASEF196.25Unverified
5ChatGPT 3.5 SpatialFormatAccuracy77Unverified
#ModelMetricClaimedVerifiedStatus
1LayoutLMv2LARGEF185.2Unverified
2LayoutLMv2BASEF183.3Unverified
3LAMBERT (75M)F180.42Unverified
#ModelMetricClaimedVerifiedStatus
1DANF1 (%)95.05Unverified