SOTAVerified

Benchmarking

Papers

Showing 931940 of 5548 papers

TitleStatusHype
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Benchmarking Deep Learning Interpretability in Time Series PredictionsCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
Benchmarking Deep Models for Salient Object DetectionCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
Show:102550
← PrevPage 94 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified