SOTAVerified

Text Spotting

Scene Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.

Papers

Showing 150 of 112 papers

TitleStatusHype
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain GeneralizationCode2
Bridging the Gap Between End-to-End and Two-Step Text SpottingCode2
Hierarchical Text Spotter for Joint Text Spotting and Layout AnalysisCode2
DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text SpottingCode2
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text SpottingCode2
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text RecognitionCode2
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and BenchmarkingCode1
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text SpottingCode1
TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and VerificationCode1
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising TrainingCode1
SwinTextSpotter v2: Towards Better Synergy for Scene Text SpottingCode1
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term MatchingCode1
Parrot Captions Teach CLIP to Spot TextCode1
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in TransformerCode1
FlowText: Synthesizing Realistic Scene Text Video with Optical Flow EstimationCode1
Scalable Mask Annotation for Video Text SpottingCode1
Towards Unified Scene Text Spotting based on Sequence GenerationCode1
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-TrainingCode1
SPTS v2: Single-Point Scene Text SpottingCode1
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text SpottingCode1
GLASS: Global to Local Attention for Scene-Text SpottingCode1
Text Spotting TransformersCode1
End-to-End Video Text Spotting with TransformerCode1
SPTS: Single-Point Text SpottingCode1
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with TransformerCode1
TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text RepresentationCode1
Dictionary-Guided Scene Text RecognitionCode1
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text SpottingCode1
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped TextCode1
Scene Text Retrieval via Joint Text Detection and Similarity LearningCode1
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel SolutionCode1
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text SpottingCode1
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text SpottingCode1
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve NetworkCode1
ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVTCode1
Text-Aware Image Restoration with Diffusion Models0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
Hear the Scene: Audio-Enhanced Text Spotting0
InstructOCR: Instruction Boosting Scene Text SpottingCode0
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance0
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction0
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text SpottingCode0
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text SpottingCode0
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction0
Block-level Text Spotting with LLMs0
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model0
Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer0
Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UNITSF-measure (%) - Strong Lexicon89Unverified
2DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - Strong Lexicon88.1Unverified
3DeepSolo(ResNet-50, TextOCR)F-measure (%) - Strong Lexicon88Unverified
4DeepSolo(ResNet-50)F-measure (%) - Strong Lexicon86.8Unverified
5SRTSF-measure (%) - Strong Lexicon85.6Unverified
6TESTRF-measure (%) - Strong Lexicon85.2Unverified
7A3SF-measure (%) - Strong Lexicon84.8Unverified
8GLASSF-measure (%) - Strong Lexicon84.7Unverified
9SwinTextSpotterF-measure (%) - Strong Lexicon83.9Unverified
10FOTSF-measure (%) - Strong Lexicon83.6Unverified
#ModelMetricClaimedVerifiedStatus
1DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - No Lexicon83.6Unverified
2DeepSolo (ResNet-50, TextOCR)F-measure (%) - No Lexicon82.5Unverified
3DeepSolo (ResNet-50)F-measure (%) - No Lexicon79.7Unverified
4A3SF-measure (%) - No Lexicon79.4Unverified
5UNITSF-measure (%) - No Lexicon78.7Unverified
6GLASSF-measure (%) - No Lexicon76.6Unverified
7DEERF-measure (%) - No Lexicon74.8Unverified
8SwinTextSpotterF-measure (%) - No Lexicon74.3Unverified
9TESTRF-measure (%) - No Lexicon73.3Unverified
10MANGOF-measure (%) - No Lexicon72.9Unverified
#ModelMetricClaimedVerifiedStatus
1A3SF-measure (%) - No Lexicon64.4Unverified
2DeepSolo (ResNet-50)F-measure (%) - No Lexicon64.2Unverified
3SPTSF-measure (%) - No Lexicon63.6Unverified
4ABINet++F-measure (%) - No Lexicon60.2Unverified
5TPSNetF-measure (%) - No Lexicon59.7Unverified
6MANGOF-measure (%) - No Lexicon58.9Unverified
7ABCNet v2F-measure (%) - No Lexicon57.5Unverified
8TextPerceptronF-measure (%) - No Lexicon57Unverified
9TESTRF-measure (%) - No Lexicon56Unverified
10SwinTextSpotterF-measure (%) - No Lexicon51.8Unverified
#ModelMetricClaimedVerifiedStatus
1DeepSolo (ViTAEv2-S, TextOCR)F-measure (%) - No Lexicon68.8Unverified
2DeepSolo (ResNet-50, TextOCR)F-measure (%) - No Lexicon64.6Unverified
3SwinTextSpotterF-measure (%) - No Lexicon55.4Unverified
4DeepSolo (ResNet-50)F-measure (%) - No Lexicon48.5Unverified
5MaskTextSpotter v2F-measure (%) - No Lexicon39Unverified
6SPTSF-measure (%) - No Lexicon38.3Unverified
7ABCNet v2F-measure (%) - No Lexicon34.5Unverified
8TESTRF-measure (%) - No Lexicon34.2Unverified
9ABCNetF-measure (%) - No Lexicon22.2Unverified