SOTAVerified

Benchmarking

Papers

Showing 431440 of 5548 papers

TitleStatusHype
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
Benchmarking Data Science AgentsCode1
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment GraphCode1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning RobustnessCode1
CIDEr: Consensus-based Image Description EvaluationCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
AD-LLM: Benchmarking Large Language Models for Anomaly DetectionCode1
Show:102550
← PrevPage 44 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified