SOTAVerified

Benchmarking

Papers

Showing 10811090 of 5548 papers

TitleStatusHype
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
A Review and Efficient Implementation of Scene Graph Generation MetricsCode1
2.5D Visual Relationship DetectionCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code GenerationCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Show:102550
← PrevPage 109 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified