SOTAVerified

Benchmarking

Papers

Showing 10911100 of 5548 papers

TitleStatusHype
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Show:102550
← PrevPage 110 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified