SOTAVerified

Benchmarking

Papers

Showing 14761500 of 5548 papers

TitleStatusHype
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking DatasetsCode1
MNIST-C: A Robustness Benchmark for Computer VisionCode1
Meta-Surrogate Benchmarking for Hyperparameter OptimizationCode1
Benchmarking Regression Methods: A comparison with CGANCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Benchmarking Natural Language Understanding Services for building Conversational AgentsCode1
NAS-Bench-101: Towards Reproducible Neural Architecture SearchCode1
The StarCraft Multi-Agent ChallengeCode1
The Liver Tumor Segmentation Benchmark (LiTS)Code1
LEAF: A Benchmark for Federated SettingsCode1
GuacaMol: Benchmarking Models for De Novo Molecular DesignCode1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization HeuristicsCode1
On Evaluation of Embodied Navigation AgentsCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Texygen: A Benchmarking Platform for Text Generation ModelsCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
featsel: A framework for benchmarking of feature selection algorithms and cost functionsCode1
Multitask learning and benchmarking with clinical time series dataCode1
MS MARCO: A Human Generated MAchine Reading COmprehension DatasetCode1
CIDEr: Consensus-based Image Description EvaluationCode1
Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form SurfacesCode1
Visual Place Recognition for Large-Scale UAV Applications0
Training Transformers with Enforced Lipschitz Constants0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Show:102550
← PrevPage 60 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified