SOTAVerified

Benchmarking

Papers

Showing 871880 of 5548 papers

TitleStatusHype
In Search of Lost Online Test-time Adaptation: A SurveyCode1
Re-evaluating Retrosynthesis Algorithms with SyntheseusCode1
MLFMF: Data Sets for Machine Learning for Mathematical FormalizationCode1
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
Fast hyperboloid decision tree algorithmsCode1
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection BenchmarkCode1
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution ShiftCode1
Object-aware Inversion and Reassembly for Image EditingCode1
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For NowCode1
FactCHD: Benchmarking Fact-Conflicting Hallucination DetectionCode1
Show:102550
← PrevPage 88 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified