SOTAVerified

Benchmarking

Papers

Showing 36813690 of 5548 papers

TitleStatusHype
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Causal Study to Interpret Large Language Models for Source Code0
Efficient Benchmarking of Language Models0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman ProcessCode0
Beyond MD17: the reactive xxMD datasetCode0
Show:102550
← PrevPage 369 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified