Text Spotting

Scene Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 112 papers

Title	Date	Tasks	Status	Hype
Text-Aware Image Restoration with Diffusion Models	Jun 11, 2025	DenoisingHallucination	—Unverified	0
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking	May 28, 2025	BenchmarkingText Spotting	CodeCode Available	1
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting	Apr 14, 2025	Domain AdaptationText Detection	CodeCode Available	1
TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification	Mar 9, 2025	Robot NavigationSTS	CodeCode Available	1
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	—Unverified	0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR	Jan 1, 2025	AllOptical Character Recognition	—Unverified	0
Hear the Scene: Audio-Enhanced Text Spotting	Dec 27, 2024	Text Spotting	—Unverified	0
InstructOCR: Instruction Boosting Scene Text Spotting	Dec 20, 2024	Optical Character Recognition (OCR)Text Spotting	CodeCode Available	0
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance	Dec 13, 2024	Scene Text RecognitionText Spotting	—Unverified	0
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction	Nov 2, 2024	Image ReconstructionOptical Character Recognition (OCR)	—Unverified	0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting	Aug 27, 2024	BenchmarkingDecoder	CodeCode Available	0
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training	Aug 1, 2024	DenoisingGraph Matching	CodeCode Available	1
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting	Jul 28, 2024	Contrastive LearningText Spotting	CodeCode Available	0
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction	Jul 23, 2024	Image InpaintingImage Restoration	—Unverified	0
Block-level Text Spotting with LLMs	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified	0
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model	May 29, 2024	PositionText Spotting	—Unverified	0
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization	Apr 30, 2024	Domain AdaptationDomain Generalization	CodeCode Available	2
Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer	Apr 19, 2024	DecoderOptical Character Recognition	—Unverified	0
Bridging the Gap Between End-to-End and Two-Step Text Spotting	Apr 6, 2024	Text Spotting	CodeCode Available	2
Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments	Apr 1, 2024	Ensemble LearningText Detection	—Unverified	0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition	Mar 28, 2024	Decoderdocument understanding	—Unverified	0
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model	Mar 15, 2024	Language ModelingLanguage Modelling	—Unverified	0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	Mar 7, 2024	document understandingKey Information Extraction	CodeCode Available	5
Efficiently Leveraging Linguistic Priors for Scene Text Spotting	Feb 27, 2024	Scene Text RecognitionText Detection	—Unverified	0
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing	Feb 12, 2024	Optical Character RecognitionOptical Character Recognition (OCR)	—Unverified	0
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting	Jan 15, 2024	Text DetectionText Spotting	CodeCode Available	1
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching	Jan 13, 2024	Text DetectionText Spotting	CodeCode Available	1
Watermark Text Pattern Spotting in Document Images	Jan 10, 2024	Text Spotting	—Unverified	0
GloTSFormer: Global Video Text Spotting Transformer	Jan 8, 2024	Text Spotting	CodeCode Available	0
Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling	Jan 8, 2024	Text DetectionText Spotting	—Unverified	0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition	Jan 1, 2024	Decoderdocument understanding	—Unverified	0
Word length-aware text spotting: Enhancing detection and recognition in dense text image	Dec 25, 2023	Text DetectionText Spotting	—Unverified	0
Parrot Captions Teach CLIP to Spot Text	Dec 21, 2023	Representation Learningtext similarity	CodeCode Available	1
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis	Oct 25, 2023	Text Spotting	CodeCode Available	2
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance	Oct 2, 2023	Scene Text DetectionText Detection	CodeCode Available	0
Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes	Oct 1, 2023	Super-ResolutionText Spotting	—Unverified	0
STEP -- Towards Structured Scene-Text Spotting	Sep 5, 2023	Optical Character Recognition (OCR)Scene Text Detection	CodeCode Available	0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration	Sep 3, 2023	Decoderdocument understanding	—Unverified	0
Deformation Robust Text Spotting with Geometric Prior	Aug 31, 2023	DiversityText Detection	—Unverified	0
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer	Aug 20, 2023	DecoderText Detection	CodeCode Available	1
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision	Jun 6, 2023	DecoderScene Text Detection	—Unverified	0
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting	May 31, 2023	DecoderScene Text Detection	CodeCode Available	2
FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation	May 5, 2023	Optical Flow EstimationText Spotting	CodeCode Available	1
Scalable Mask Annotation for Video Text Spotting	May 2, 2023	Text Spotting	CodeCode Available	1
ICDAR 2023 Video Text Reading Competition for Dense and Small Text	Apr 10, 2023	Task 2Text Detection	—Unverified	0
Towards Unified Scene Text Spotting based on Sequence Generation	Apr 7, 2023	Text Spotting	CodeCode Available	1
VGTS: Visually Guided Text Spotting for Novel Categories in Historical Manuscripts	Apr 3, 2023	Geometric MatchingMetric Learning	—Unverified	0
Video text tracking for dense and small text based on pp-yoloe-r and sort algorithm	Mar 31, 2023	object-detectionObject Detection	—Unverified	0
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild	Mar 23, 2023	Text Spotting	—Unverified	0
A3S: Adversarial learning of semantic representations for Scene-Text Spotting	Feb 21, 2023	Text Spotting	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

All datasets ICDAR 2015 Total-Text SCUT-CTW1500 Inverse-Text

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UNITS	F-measure (%) - Strong Lexicon	89	—	Unverified
2	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - Strong Lexicon	88.1	—	Unverified
3	DeepSolo(ResNet-50, TextOCR)	F-measure (%) - Strong Lexicon	88	—	Unverified
4	DeepSolo(ResNet-50)	F-measure (%) - Strong Lexicon	86.8	—	Unverified
5	SRTS	F-measure (%) - Strong Lexicon	85.6	—	Unverified
6	TESTR	F-measure (%) - Strong Lexicon	85.2	—	Unverified
7	A3S	F-measure (%) - Strong Lexicon	84.8	—	Unverified
8	GLASS	F-measure (%) - Strong Lexicon	84.7	—	Unverified
9	SwinTextSpotter	F-measure (%) - Strong Lexicon	83.9	—	Unverified
10	FOTS	F-measure (%) - Strong Lexicon	83.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	83.6	—	Unverified
2	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	82.5	—	Unverified
3	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	79.7	—	Unverified
4	A3S	F-measure (%) - No Lexicon	79.4	—	Unverified
5	UNITS	F-measure (%) - No Lexicon	78.7	—	Unverified
6	GLASS	F-measure (%) - No Lexicon	76.6	—	Unverified
7	DEER	F-measure (%) - No Lexicon	74.8	—	Unverified
8	SwinTextSpotter	F-measure (%) - No Lexicon	74.3	—	Unverified
9	TESTR	F-measure (%) - No Lexicon	73.3	—	Unverified
10	MANGO	F-measure (%) - No Lexicon	72.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	A3S	F-measure (%) - No Lexicon	64.4	—	Unverified
2	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	64.2	—	Unverified
3	SPTS	F-measure (%) - No Lexicon	63.6	—	Unverified
4	ABINet++	F-measure (%) - No Lexicon	60.2	—	Unverified
5	TPSNet	F-measure (%) - No Lexicon	59.7	—	Unverified
6	MANGO	F-measure (%) - No Lexicon	58.9	—	Unverified
7	ABCNet v2	F-measure (%) - No Lexicon	57.5	—	Unverified
8	TextPerceptron	F-measure (%) - No Lexicon	57	—	Unverified
9	TESTR	F-measure (%) - No Lexicon	56	—	Unverified
10	SwinTextSpotter	F-measure (%) - No Lexicon	51.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeepSolo (ViTAEv2-S, TextOCR)	F-measure (%) - No Lexicon	68.8	—	Unverified
2	DeepSolo (ResNet-50, TextOCR)	F-measure (%) - No Lexicon	64.6	—	Unverified
3	SwinTextSpotter	F-measure (%) - No Lexicon	55.4	—	Unverified
4	DeepSolo (ResNet-50)	F-measure (%) - No Lexicon	48.5	—	Unverified
5	MaskTextSpotter v2	F-measure (%) - No Lexicon	39	—	Unverified
6	SPTS	F-measure (%) - No Lexicon	38.3	—	Unverified
7	ABCNet v2	F-measure (%) - No Lexicon	34.5	—	Unverified
8	TESTR	F-measure (%) - No Lexicon	34.2	—	Unverified
9	ABCNet	F-measure (%) - No Lexicon	22.2	—	Unverified