SOTAVerified

Scene Text Recognition

See Scene Text Detection for leaderboards in this task.

Papers

Showing 51100 of 269 papers

TitleStatusHype
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text RecognizerCode1
Scene Text Image Super-resolution based on Text-conditional Diffusion ModelsCode1
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth EvaluationCode1
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and BeyondCode0
Scene Text Recognition Models Explainability Using Local FeaturesCode1
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionCode1
Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be SolvedCode0
Orientation-Independent Chinese Text Recognition in Scene ImagesCode2
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS AligningCode2
DTrOCR: Decoder-only Transformer for Optical Character RecognitionCode2
LISTER: Neighbor Decoding for Length-Insensitive Scene Text RecognitionCode0
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text TranslationCode1
Relational Contrastive Learning for Scene Text RecognitionCode1
Multi-Granularity Prediction with Learnable Fusion for Scene Text RecognitionCode0
Context Perception Parallel Decoder for Scene Text RecognitionCode0
Towards Robust Scene Text Image Super-resolution via Explicit Location EnhancementCode1
Revisiting Scene Text Recognition: A Data PerspectiveCode2
Reading Between the Lanes: Text VideoQA on the RoadCode0
DiffusionSTR: Diffusion Model for Scene Text Recognition0
Weakly Supervised Scene Text Generation for Low-resource Languages0
Looking and Listening: Audio Guided Text RecognitionCode1
Masked and Permuted Implicit Context Learning for Scene Text RecognitionCode0
MRN: Multiplexed Routing Network for Incremental Multilingual Text RecognitionCode1
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language ModelCode1
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text RecognitionCode1
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text RecognitionCode1
Scene Text Recognition with Image-Text Matching-guided Dictionary0
Improving Scene Text Recognition for Character-Level Long-Tailed Distribution0
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model0
Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition0
Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition0
Geometric Perception based Efficient Text RecognitionCode0
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition0
B-Spline Texture Coefficients Estimator for Screen Content Image Super-ResolutionCode1
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text SpottingCode1
Portmanteauing Features for Scene Text Recognition0
Masked Vision-Language Transformers for Scene Text RecognitionCode1
Pure Transformer with Integrated Experts for Scene Text Recognition0
Self-supervised Character-to-Character Distillation for Text RecognitionCode1
Scene Text Recognition with Semantics0
Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks0
Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior0
Out-of-Vocabulary Challenge Report0
Levenshtein OCRCode0
Multi-Granularity Prediction for Scene Text RecognitionCode0
Scene Text Recognition with Single-Point Decoding Network0
Vision-Language Adaptive Mutual Decoder for OOV-STR0
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words0
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text RecognitionCode1
Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L*Accuracy99.42Unverified
2DTrOCR 105MAccuracy99.4Unverified
3CLIP4STR-L (DataComp-1B)Accuracy99Unverified
4MGP-STRAccuracy98.5Unverified
5CLIP4STR-LAccuracy98.5Unverified
6CLIP4STR-BAccuracy98.3Unverified
7CCD-ViT-Base(ARD_2.8M)Accuracy98.3Unverified
8CCD-ViT-Small(ARD_2.8M)Accuracy98.3Unverified
9MATRNAccuracy97.9Unverified
10S-GTRAccuracy97.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-H (DFN-5B)Accuracy99.1Unverified
2DTrOCR 105MAccuracy98.9Unverified
3CLIP4STR-B*Accuracy98.76Unverified
4MGP-STRAccuracy98.6Unverified
5CLIP4STR-L (DataComp-1B)Accuracy98.6Unverified
6CLIP4STR-LAccuracy98.5Unverified
7CPPDAccuracy98.5Unverified
8CLIP4STR-BAccuracy98.3Unverified
9CCD-ViT-Base(ARD_2.8M)Accuracy97.8Unverified
10CCD-ViT-Small(ARD_2.8M)Accuracy96.4Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy93.5Unverified
2CLIP4STR-L*Accuracy92.6Unverified
3CPPDAccuracy91.7Unverified
4CLIP4STR-L (DataComp-1B)Accuracy91.4Unverified
5MGP-STRAccuracy90.9Unverified
6CLIP4STR-LAccuracy90.8Unverified
7CLIP4STR-BAccuracy90.6Unverified
8SIGA_SAccuracy87.6Unverified
9S-GTRAccuracy87.3Unverified
10MATRNAccuracy86.6Unverified
#ModelMetricClaimedVerifiedStatus
1CPPDAccuracy99.7Unverified
2CLIP4STR-L (DataComp-1B)Accuracy99.7Unverified
3CLIP4STR-B*Accuracy99.65Unverified
4MGP-STRAccuracy99.31Unverified
5CLIP4STR-BAccuracy99.3Unverified
6DTrOCR 105MAccuracy99.1Unverified
7CLIP4STR-LAccuracy99Unverified
8CCD-ViT-Base(ARD_2.8M)Accuracy98.3Unverified
9CCD-ViT-Small(ARD_2.8M)Accuracy98.3Unverified
10CCD-ViT-Tiny(ARD_2.8M)Accuracy95.8Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy99.6Unverified
2CLIP4STR-L (DataComp-1B)Accuracy99.6Unverified
3CLIP4STR-LAccuracy99.5Unverified
4CLIP4STR-B (DataComp-1B)Accuracy99.5Unverified
5CPPDAccuracy99.3Unverified
6CLIP4STR-BAccuracy99.2Unverified
7MGP-STRAccuracy98.8Unverified
8CCD-ViT-Base(ARD_2.8M)Accuracy98Unverified
9CCD-ViT-Small(ARD_2.8M)Accuracy98Unverified
10S-GTRAccuracy97.5Unverified
#ModelMetricClaimedVerifiedStatus
1DTrOCR 105MAccuracy98.6Unverified
2MGP-STRAccuracy98.3Unverified
3CLIP4STR-L*Accuracy98.13Unverified
4CLIP4STR-L (DataComp-1B)Accuracy98.1Unverified
5CLIP4STR-LAccuracy97.4Unverified
6CLIP4STR-BAccuracy97.2Unverified
7CPPDAccuracy96.7Unverified
8CCD-ViT-BaseAccuracy96.1Unverified
9CCD-ViT-SmallAccuracy92.7Unverified
10CCD-ViT-TinyAccuracy91.6Unverified
#ModelMetricClaimedVerifiedStatus
1Yet Another Text RecognizerAccuracy97.1Unverified
2SIGA_TAccuracy97Unverified
3SATRNAccuracy96.7Unverified
4DANAccuracy95Unverified
5SAFLAccuracy95Unverified
6CSTRAccuracy94.8Unverified
7Baek et al.Accuracy94.4Unverified
8ViTSTRAccuracy94.3Unverified
9AONAccuracy91.5Unverified
10RAREAccuracy90.1Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-H (DFN-5B)1:1 Accuracy90.9Unverified
2CLIP4STR-L (DataComp-1B)1:1 Accuracy90.6Unverified
3CLIP4STR-L1:1 Accuracy88.8Unverified
4CLIP4STR-B1:1 Accuracy87Unverified
5CCD-ViT-Base1:1 Accuracy86Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L (DataComp-1B)Accuracy (%)86.4Unverified
2CLIP4STR-LAccuracy (%)85.9Unverified
3CLIP4STR-BAccuracy (%)85.8Unverified
4MGP-STRAccuracy (%)85.5Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L1:1 Accuracy81.9Unverified
2MGP-STR1:1 Accuracy81.7Unverified
3CLIP4STR-B1:1 Accuracy81.1Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L1:1 Accuracy82.7Unverified
2CLIP4STR-B1:1 Accuracy79.8Unverified
3CCD-ViT-Base1:1 Accuracy77.3Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4STR-L (DataComp-1B)Accuracy (%)92.2Unverified
2MGP-STRAccuracy (%)91Unverified
3CLIP4STR-BAccuracy (%)86.8Unverified
#ModelMetricClaimedVerifiedStatus
1ABINet-LV+TPS++Accuracy97.8Unverified
#ModelMetricClaimedVerifiedStatus
1MLDGAverage Accuracy19.02Unverified
#ModelMetricClaimedVerifiedStatus
1ABINet-LV+TPS++Accuracy89.6Unverified