SOTAVerified

Benchmarking

Papers

Showing 10111020 of 5548 papers

TitleStatusHype
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K TokensCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Show:102550
← PrevPage 102 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified