SOTAVerified

Benchmarking

Papers

Showing 10211030 of 5548 papers

TitleStatusHype
CodeUpdateArena: Benchmarking Knowledge Editing on API UpdatesCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
AI Agents That MatterCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
AI Accelerator Survey and TrendsCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defensesCode1
Show:102550
← PrevPage 103 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified