SOTAVerified

Benchmarking

Papers

Showing 28112820 of 5548 papers

TitleStatusHype
NeIn: Telling What You Don't Want0
DetoxBench: Benchmarking Large Language Models for Multitask Fraud & Abuse Detection0
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision MakingCode0
Quantum Kernel Methods under Scrutiny: A Benchmarking Study0
Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms0
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm0
Question-Answering Dense Video EventsCode0
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression0
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like PostsCode0
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management0
Show:102550
← PrevPage 282 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified