SOTAVerified

Benchmarking

Papers

Showing 12411250 of 5548 papers

TitleStatusHype
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument RolesCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
A Comprehensive Benchmark for RNA 3D Structure-Function ModelingCode1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Show:102550
← PrevPage 125 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified