SOTAVerified

Benchmarking

Papers

Showing 16811690 of 5548 papers

TitleStatusHype
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs0
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision MakingCode0
Insights from Benchmarking Frontier Language Models on Web App Code GenerationCode1
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm0
Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms0
Quantum Kernel Methods under Scrutiny: A Benchmarking Study0
PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease SegmentationCode2
Question-Answering Dense Video EventsCode0
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management0
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift0
Show:102550
← PrevPage 169 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified