SOTAVerified

Benchmarking

Papers

Showing 12911300 of 5548 papers

TitleStatusHype
Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport0
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation0
Benchmarking Active Learning for NILM0
ChemSafetyBench: Benchmarking LLM Safety on Chemistry DomainCode0
Reassessing Layer Pruning in LLMs: New Insights and MethodsCode0
Benchmarking the Robustness of Optical Flow Estimation to CorruptionsCode0
AdamZ: An Enhanced Optimisation Method for Neural Network TrainingCode0
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains0
StackEval: Benchmarking LLMs in Coding AssistanceCode1
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise LevelsCode0
Show:102550
← PrevPage 130 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified