SOTAVerified

Benchmarking

Papers

Showing 371380 of 5548 papers

TitleStatusHype
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
Benchmarking Robustness of 3D Point Cloud Recognition Against Common CorruptionsCode2
PEDANTS: Cheap but Effective and Interpretable Answer EquivalenceCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
Benchmarking Benchmark Leakage in Large Language ModelsCode2
ClimateLearn: Benchmarking Machine Learning for Weather and Climate ModelingCode2
CausalGym: Benchmarking causal interpretability methods on linguistic tasksCode2
Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation FrameworkCode2
PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEsCode2
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
Show:102550
← PrevPage 38 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified