SOTAVerified

Scene Text Recognition

See Scene Text Detection for leaderboards in this task.

Papers

Showing 101150 of 269 papers

TitleStatusHype
Out of Length Text Recognition with Sub-String MatchingCode0
Focus on the Whole Character: Discriminative Character Modeling for Scene Text RecognitionCode0
The First Swahili Language Scene Text Detection and Recognition DatasetCode0
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition0
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing0
Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer0
JSTR: Judgment Improves Scene Text Recognition0
Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss0
IndicSTR12: A Dataset for Indic Scene Text Recognition0
Efficiently Leveraging Linguistic Priors for Scene Text Spotting0
Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
Instruction-Guided Scene Text RecognitionCode0
CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition0
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing0
OTE: Exploring Accurate Scene Text Recognition Using One TokenCode0
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text RecognitionCode0
STR-Cert: Robustness Certification for Deep Text Recognition on Deep Learning Pipelines and Vision Transformers0
Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution0
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be SolvedCode0
LISTER: Neighbor Decoding for Length-Insensitive Scene Text RecognitionCode0
Multi-Granularity Prediction with Learnable Fusion for Scene Text RecognitionCode0
Context Perception Parallel Decoder for Scene Text RecognitionCode0
Reading Between the Lanes: Text VideoQA on the RoadCode0
DiffusionSTR: Diffusion Model for Scene Text Recognition0
Weakly Supervised Scene Text Generation for Low-resource Languages0
Masked and Permuted Implicit Context Learning for Scene Text RecognitionCode0
Scene Text Recognition with Image-Text Matching-guided Dictionary0
Improving Scene Text Recognition for Character-Level Long-Tailed Distribution0
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model0
Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition0
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition0
Geometric Perception based Efficient Text RecognitionCode0
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition0
Portmanteauing Features for Scene Text Recognition0
Pure Transformer with Integrated Experts for Scene Text Recognition0
Scene Text Recognition with Semantics0
Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks0
Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior0
Out-of-Vocabulary Challenge Report0
Multi-Granularity Prediction for Scene Text RecognitionCode0
Levenshtein OCRCode0
Scene Text Recognition with Single-Point Decoding Network0
Vision-Language Adaptive Mutual Decoder for OOV-STR0
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words0
Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning0
SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition0
SVTR: Scene Text Recognition with a Single Visual ModelCode0
Towards Open-Set Text Recognition via Label-to-Prototype Learning0
Show:102550
← PrevPage 3 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L*Accuracy99.42Unverified
2DTrOCR 105MAccuracy99.4Unverified
3CLIP4STR-L (DataComp-1B)Accuracy99Unverified
4MGP-STRAccuracy98.5Unverified
5CLIP4STR-LAccuracy98.5Unverified
6CLIP4STR-BAccuracy98.3Unverified
7CCD-ViT-Base(ARD_2.8M)Accuracy98.3Unverified
8CCD-ViT-Small(ARD_2.8M)Accuracy98.3Unverified
9MATRNAccuracy97.9Unverified
10S-GTRAccuracy97.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-H (DFN-5B)Accuracy99.1Unverified
2DTrOCR 105MAccuracy98.9Unverified
3CLIP4STR-B*Accuracy98.76Unverified
4MGP-STRAccuracy98.6Unverified
5CLIP4STR-L (DataComp-1B)Accuracy98.6Unverified
6CLIP4STR-LAccuracy98.5Unverified
7CPPDAccuracy98.5Unverified
8CLIP4STR-BAccuracy98.3Unverified
9CCD-ViT-Base(ARD_2.8M)Accuracy97.8Unverified
10CCD-ViT-Small(ARD_2.8M)Accuracy96.4Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy93.5Unverified
2CLIP4STR-L*Accuracy92.6Unverified
3CPPDAccuracy91.7Unverified
4CLIP4STR-L (DataComp-1B)Accuracy91.4Unverified
5MGP-STRAccuracy90.9Unverified
6CLIP4STR-LAccuracy90.8Unverified
7CLIP4STR-BAccuracy90.6Unverified
8SIGA_SAccuracy87.6Unverified
9S-GTRAccuracy87.3Unverified
10MATRNAccuracy86.6Unverified
#ModelMetricClaimedVerifiedStatus
1CPPDAccuracy99.7Unverified
2CLIP4STR-L (DataComp-1B)Accuracy99.7Unverified
3CLIP4STR-B*Accuracy99.65Unverified
4MGP-STRAccuracy99.31Unverified
5CLIP4STR-BAccuracy99.3Unverified
6DTrOCR 105MAccuracy99.1Unverified
7CLIP4STR-LAccuracy99Unverified
8CCD-ViT-Base(ARD_2.8M)Accuracy98.3Unverified
9CCD-ViT-Small(ARD_2.8M)Accuracy98.3Unverified
10CCD-ViT-Tiny(ARD_2.8M)Accuracy95.8Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy99.6Unverified
2CLIP4STR-L (DataComp-1B)Accuracy99.6Unverified
3CLIP4STR-LAccuracy99.5Unverified
4CLIP4STR-B (DataComp-1B)Accuracy99.5Unverified
5CPPDAccuracy99.3Unverified
6CLIP4STR-BAccuracy99.2Unverified
7MGP-STRAccuracy98.8Unverified
8CCD-ViT-Base(ARD_2.8M)Accuracy98Unverified
9CCD-ViT-Small(ARD_2.8M)Accuracy98Unverified
10S-GTRAccuracy97.5Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy98.6Unverified
2MGP-STRAccuracy98.3Unverified
3CLIP4STR-L*Accuracy98.13Unverified
4CLIP4STR-L (DataComp-1B)Accuracy98.1Unverified
5CLIP4STR-LAccuracy97.4Unverified
6CLIP4STR-BAccuracy97.2Unverified
7CPPDAccuracy96.7Unverified
8CCD-ViT-BaseAccuracy96.1Unverified
9CCD-ViT-SmallAccuracy92.7Unverified
10CCD-ViT-TinyAccuracy91.6Unverified
#ModelMetricClaimedVerifiedStatus
1Yet Another Text RecognizerAccuracy97.1Unverified
2SIGA_TAccuracy97Unverified
3SATRNAccuracy96.7Unverified
4DANAccuracy95Unverified
5SAFLAccuracy95Unverified
6CSTRAccuracy94.8Unverified
7Baek et al.Accuracy94.4Unverified
8ViTSTRAccuracy94.3Unverified
9AONAccuracy91.5Unverified
10RAREAccuracy90.1Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-H (DFN-5B)1:1 Accuracy90.9Unverified
2CLIP4STR-L (DataComp-1B)1:1 Accuracy90.6Unverified
3CLIP4STR-L1:1 Accuracy88.8Unverified
4CLIP4STR-B1:1 Accuracy87Unverified
5CCD-ViT-Base1:1 Accuracy86Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L (DataComp-1B)Accuracy (%)86.4Unverified
2CLIP4STR-LAccuracy (%)85.9Unverified
3CLIP4STR-BAccuracy (%)85.8Unverified
4MGP-STRAccuracy (%)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L1:1 Accuracy81.9Unverified
2MGP-STR1:1 Accuracy81.7Unverified
3CLIP4STR-B1:1 Accuracy81.1Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L1:1 Accuracy82.7Unverified
2CLIP4STR-B1:1 Accuracy79.8Unverified
3CCD-ViT-Base1:1 Accuracy77.3Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L (DataComp-1B)Accuracy (%)92.2Unverified
2MGP-STRAccuracy (%)91Unverified
3CLIP4STR-BAccuracy (%)86.8Unverified
#ModelMetricClaimedVerifiedStatus
1ABINet-LV+TPS++Accuracy97.8Unverified
#ModelMetricClaimedVerifiedStatus
1MLDGAverage Accuracy19.02Unverified
#ModelMetricClaimedVerifiedStatus
1ABINet-LV+TPS++Accuracy89.6Unverified