SOTAVerified

Benchmarking

Papers

Showing 721730 of 5548 papers

TitleStatusHype
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Deluca -- A Differentiable Control Library: Environments, Methods, and BenchmarkingCode1
CharacterBench: Benchmarking Character Customization of Large Language ModelsCode1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
DependEval: Benchmarking LLMs for Repository Dependency UnderstandingCode1
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
Bag of Tricks for Adversarial TrainingCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World ScenariosCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
Show:102550
← PrevPage 73 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified