SOTAVerified

Benchmarking

Papers

Showing 941950 of 5548 papers

TitleStatusHype
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor EnvironmentsCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and CollaborationCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
High-Dimensional Inference in Bayesian NetworksCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
Show:102550
← PrevPage 95 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified