SOTAVerified

Benchmarking

Papers

Showing 971980 of 5548 papers

TitleStatusHype
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient LearningCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
MLonMCU: TinyML Benchmarking with Fast RetargetingCode1
Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline MaterialsCode1
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness MethodsCode1
PaReprop: Fast Parallelized Reversible BackpropagationCode1
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language ModelsCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
KoLA: Carefully Benchmarking World Knowledge of Large Language ModelsCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
Show:102550
← PrevPage 98 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified