SOTAVerified

Benchmarking

Papers

Showing 12511260 of 5548 papers

TitleStatusHype
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
Benchmarking Reinforcement Learning Techniques for Autonomous NavigationCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBsCode1
Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science DomainsCode1
Graph Neural Network-Based Anomaly Detection for River Network SystemsCode1
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine LearningCode1
A framework for benchmarking clustering algorithmsCode1
CodeUpdateArena: Benchmarking Knowledge Editing on API UpdatesCode1
Show:102550
← PrevPage 126 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified