SOTAVerified

Benchmarking

Papers

Showing 19912000 of 5548 papers

TitleStatusHype
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
Benchmark of Deep Learning Models on Large Healthcare MIMIC DatasetsCode0
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Show:102550
← PrevPage 200 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified