SOTAVerified

Benchmarking

Papers

Showing 15411550 of 5548 papers

TitleStatusHype
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Towards Enhancing Fault Tolerance in Neural NetworksCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeCode0
Ants can orienteer a thief in their robberyCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking Educational Program RepairCode0
ANTHROPOS-V: benchmarking the novel task of Crowd Volume EstimationCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
Show:102550
← PrevPage 155 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified