SOTAVerified

Benchmarking

Papers

Showing 451460 of 5548 papers

TitleStatusHype
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacousticsCode1
AD-LLM: Benchmarking Large Language Models for Anomaly DetectionCode1
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening ModelsCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment GraphCode1
CIDEr: Consensus-based Image Description EvaluationCode1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning RobustnessCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
Show:102550
← PrevPage 46 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified