SOTAVerified

Text Spotting

Scene Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.

Papers

Showing 150 of 112 papers

TitleStatusHype
Text-Aware Image Restoration with Diffusion Models0
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and BenchmarkingCode1
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text SpottingCode1
TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and VerificationCode1
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
Hear the Scene: Audio-Enhanced Text Spotting0
InstructOCR: Instruction Boosting Scene Text SpottingCode0
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance0
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising TrainingCode1
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text SpottingCode0
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction0
Block-level Text Spotting with LLMs0
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model0
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain GeneralizationCode2
Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer0
Bridging the Gap Between End-to-End and Two-Step Text SpottingCode2
Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionCode0
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
Efficiently Leveraging Linguistic Priors for Scene Text Spotting0
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing0
SwinTextSpotter v2: Towards Better Synergy for Scene Text SpottingCode1
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term MatchingCode1
Watermark Text Pattern Spotting in Document Images0
GloTSFormer: Global Video Text Spotting TransformerCode0
Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table RecognitionCode0
Word length-aware text spotting: Enhancing detection and recognition in dense text image0
Parrot Captions Teach CLIP to Spot TextCode1
Hierarchical Text Spotter for Joint Text Spotting and Layout AnalysisCode2
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting PerformanceCode0
Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes0
STEP -- Towards Structured Scene-Text SpottingCode0
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration0
Deformation Robust Text Spotting with Geometric Prior0
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in TransformerCode1
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision0
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text SpottingCode2
FlowText: Synthesizing Realistic Scene Text Video with Optical Flow EstimationCode1
Scalable Mask Annotation for Video Text SpottingCode1
ICDAR 2023 Video Text Reading Competition for Dense and Small Text0
Towards Unified Scene Text Spotting based on Sequence GenerationCode1
VGTS: Visually Guided Text Spotting for Novel Categories in Historical Manuscripts0
Video text tracking for dense and small text based on pp-yoloe-r and sort algorithm0
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild0
A3S: Adversarial learning of semantic representations for Scene-Text Spotting0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UNITSF-measure (%) - Strong Lexicon89Unverified
2DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - Strong Lexicon88.1Unverified
3DeepSolo(ResNet-50, TextOCR)F-measure (%) - Strong Lexicon88Unverified
4DeepSolo(ResNet-50)F-measure (%) - Strong Lexicon86.8Unverified
5SRTSF-measure (%) - Strong Lexicon85.6Unverified
6TESTRF-measure (%) - Strong Lexicon85.2Unverified
7A3SF-measure (%) - Strong Lexicon84.8Unverified
8GLASSF-measure (%) - Strong Lexicon84.7Unverified
9SwinTextSpotterF-measure (%) - Strong Lexicon83.9Unverified
10FOTSF-measure (%) - Strong Lexicon83.6Unverified
#ModelMetricClaimedVerifiedStatus
1DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - No Lexicon83.6Unverified
2DeepSolo (ResNet-50, TextOCR)F-measure (%) - No Lexicon82.5Unverified
3DeepSolo (ResNet-50)F-measure (%) - No Lexicon79.7Unverified
4A3SF-measure (%) - No Lexicon79.4Unverified
5UNITSF-measure (%) - No Lexicon78.7Unverified
6GLASSF-measure (%) - No Lexicon76.6Unverified
7DEERF-measure (%) - No Lexicon74.8Unverified
8SwinTextSpotterF-measure (%) - No Lexicon74.3Unverified
9TESTRF-measure (%) - No Lexicon73.3Unverified
10MANGOF-measure (%) - No Lexicon72.9Unverified
#ModelMetricClaimedVerifiedStatus
1A3SF-measure (%) - No Lexicon64.4Unverified
2DeepSolo (ResNet-50)F-measure (%) - No Lexicon64.2Unverified
3SPTSF-measure (%) - No Lexicon63.6Unverified
4ABINet++F-measure (%) - No Lexicon60.2Unverified
5TPSNetF-measure (%) - No Lexicon59.7Unverified
6MANGOF-measure (%) - No Lexicon58.9Unverified
7ABCNet v2F-measure (%) - No Lexicon57.5Unverified
8TextPerceptronF-measure (%) - No Lexicon57Unverified
9TESTRF-measure (%) - No Lexicon56Unverified
10SwinTextSpotterF-measure (%) - No Lexicon51.8Unverified
#ModelMetricClaimedVerifiedStatus
1DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - No Lexicon68.8Unverified
2DeepSolo (ResNet-50, TextOCR)F-measure (%) - No Lexicon64.6Unverified
3SwinTextSpotterF-measure (%) - No Lexicon55.4Unverified
4DeepSolo (ResNet-50)F-measure (%) - No Lexicon48.5Unverified
5MaskTextSpotter v2F-measure (%) - No Lexicon39Unverified
6SPTSF-measure (%) - No Lexicon38.3Unverified
7ABCNet v2F-measure (%) - No Lexicon34.5Unverified
8TESTRF-measure (%) - No Lexicon34.2Unverified
9ABCNetF-measure (%) - No Lexicon22.2Unverified