SOTAVerified

Benchmarking

Papers

Showing 981990 of 5548 papers

TitleStatusHype
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
MS MARCO: A Human Generated MAchine Reading COmprehension DatasetCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Multi-Agent Environments for Vehicle Routing ProblemsCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Benchmarking: Past, Present and FutureCode1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMsCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
Show:102550
← PrevPage 99 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified