SOTAVerified

Benchmarking

Papers

Showing 28412850 of 5548 papers

TitleStatusHype
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection BenchmarkCode1
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package0
Benchmarking GPUs on SVBRDF Extractor Model0
Almost Equivariance via Lie Algebra Convolutions0
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution ShiftCode1
Formalizing and Benchmarking Prompt Injection Attacks and DefensesCode2
FactCHD: Benchmarking Fact-Conflicting Hallucination DetectionCode1
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For NowCode1
Object-aware Inversion and Reassembly for Image EditingCode1
Show:102550
← PrevPage 285 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified