SOTAVerified

Benchmarking

Papers

Showing 18611870 of 5548 papers

TitleStatusHype
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse ModalitiesCode1
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction0
Separable Operator NetworksCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
Benchmarking Vision Language Models for Cultural Understanding0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation0
When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph BenchmarkCode1
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control0
Show:102550
← PrevPage 187 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified