SOTAVerified

Benchmarking

Papers

Showing 921930 of 5548 papers

TitleStatusHype
Efficient Lifelong Model Evaluation in an Era of Rapid ProgressCode1
MC-Blur: A Comprehensive Benchmark for Image DeblurringCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
Benchmarking deep inverse models over time, and the neural-adjoint methodCode1
A Call to Reflect on Evaluation Practices for Failure Detection in Image ClassificationCode1
Benchmarking Micro-action Recognition: Dataset, Methods, and ApplicationsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Show:102550
← PrevPage 93 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified