Key Information Extraction

Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 74 papers

Title	Date	Tasks	Status	Hype
UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents	Jan 17, 2024	DecoderForm	—Unverified	0
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction	Jan 7, 2024	Key Information ExtractionKey-value Pair Extraction	CodeCode Available	1
Multimodal weighted graph representation for information extraction from visually rich documents.	Jan 5, 2024	Document Layout Analysisdocument understanding	CodeCode Available	0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	—Unverified	0
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency	Nov 9, 2023	document understandingKey Information Extraction	—Unverified	0
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation	Oct 25, 2023	Handwritten Text RecognitionKey Information Extraction	CodeCode Available	1
GenKIE: Robust Generative Multimodal Document Key Information Extraction	Oct 24, 2023	DecoderKey Information Extraction	CodeCode Available	1
VKIE: The Application of Key Information Extraction on Video Text	Oct 18, 2023	Key Information Extraction	—Unverified	0
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction	Oct 17, 2023	Entity LinkingKey Information Extraction	CodeCode Available	1
PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction	Oct 5, 2023	Document AIFederated Learning	—Unverified	0
Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization	Oct 2, 2023	DecoderDeep Learning	—Unverified	0
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification	Sep 18, 2023	ClassificationKey Information Extraction	CodeCode Available	0
PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts	Jul 20, 2023	Key Information Extraction	—Unverified	0
End-to-End Document Classification and Key Information Extraction using Assignment Optimization	Jun 1, 2023	ClassificationDocument Classification	—Unverified	0
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding	May 30, 2023	document-image-classificationDocument Image Classification	—Unverified	0
DUBLIN -- Document Understanding By Language-Image Network	May 23, 2023	Document Classificationdocument understanding	—Unverified	0
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models	May 13, 2023	Key Information ExtractionNutrition	CodeCode Available	2
Information Redundancy and Biases in Public Document Information Extraction Benchmarks	Apr 28, 2023	document understandingKey Information Extraction	CodeCode Available	0
SIMARA: a database for key-value information extraction from full pages	Apr 26, 2023	Handwriting RecognitionHandwritten Text Recognition	—Unverified	0
Information Extraction from Documents: Question Answering vs Token Classification in real-world setups	Apr 21, 2023	ClassificationFew-Shot Learning	—Unverified	0
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction	Apr 21, 2023	Document AIentity_extraction	—Unverified	0
PDFVQA: A New Dataset for Real-World VQA on PDF Documents	Apr 13, 2023	document understandingKey Information Extraction	—Unverified	0
Form-NLU: Dataset for the Form Natural Language Understanding	Apr 4, 2023	4kForm	CodeCode Available	1
DocILE Benchmark for Document Information Localization and Extraction	Feb 11, 2023	Key Information ExtractionUnsupervised Pre-training	CodeCode Available	1
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop	Nov 9, 2022	Document AIKey Information Extraction	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 3Next →

All datasets CORD SROIE Kleister NDA SIMARA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	RORE (GeoLayoutLM)	F1	98.52	—	Unverified
2	GeoLayoutLM	F1	97.97	—	Unverified
3	LayoutLMv3 Large	F1	97.46	—	Unverified
4	LayoutMask (large)	F1	97.19	—	Unverified
5	LayoutMask (base)	F1	96.99	—	Unverified
6	TPP (LayoutMask)	F1	96.92	—	Unverified
7	LILT	F1	96.07	—	Unverified
8	LayoutLMv2LARGE	F1	96.01	—	Unverified
9	LayoutLMv2BASE	F1	94.95	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LayoutLMv2LARGE (Excluding OCR mismatch)	F1	97.81	—	Unverified
2	RORE (GeoLayoutLM)	F1	96.97	—	Unverified
3	LayoutLMv2LARGE	F1	96.61	—	Unverified
4	LayoutLMv2BASE	F1	96.25	—	Unverified
5	ChatGPT 3.5 SpatialFormat	Accuracy	77	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	LayoutLMv2LARGE	F1	85.2	—	Unverified
2	LayoutLMv2BASE	F1	83.3	—	Unverified
3	LAMBERT (75M)	F1	80.42	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DAN	F1 (%)	95.05	—	Unverified