SOTAVerified

Benchmarking

Papers

Showing 361370 of 5548 papers

TitleStatusHype
Benchmarking the Myopic Trap: Positional Bias in Information RetrievalCode5
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI0
Benchmarking data encoding methods in Quantum Machine Learning0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation0
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas0
NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation0
SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference0
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems ImmunityCode0
Benchmarking MOEAs for solving continuous multi-objective RL problemsCode0
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference0
Show:102550
← PrevPage 37 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified