SOTAVerified

Benchmarking

Papers

Showing 39613970 of 5548 papers

TitleStatusHype
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?0
To Find Waldo You Need Contextual Cues: Debiasing Who's WaldoCode0
Earnings-22: A Practical Benchmark for Accents in the WildCode1
Parameter-efficient Model Adaptation for Vision TransformersCode1
Treatment Learning Causal Transformer for Noisy Image Classification0
A Unified Study of Machine Learning Explanation Evaluation Metrics0
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic Choices0
Benchmarking Algorithms for Automatic License Plate Recognition0
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
Visual Abductive ReasoningCode1
Show:102550
← PrevPage 397 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified