SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2841–2850 of 5548 papers

Title	Date	Tasks	Status	Hype
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark	Oct 20, 2023	Benchmarkingde-en	CodeCode Available	1
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package	Oct 20, 2023	Benchmarking	—Unverified	0
Benchmarking GPUs on SVBRDF Extractor Model	Oct 19, 2023	BenchmarkingGPU	—Unverified	0
Almost Equivariance via Lie Algebra Convolutions	Oct 19, 2023	Benchmarking	—Unverified	0
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift	Oct 19, 2023	Adversarial RobustnessBenchmarking	CodeCode Available	1
Formalizing and Benchmarking Prompt Injection Attacks and Defenses	Oct 19, 2023	Benchmarking	CodeCode Available	2
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection	Oct 18, 2023	BenchmarkingHallucination	CodeCode Available	1
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available	0
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now	Oct 18, 2023	Adversarial Robustness	CodeCode Available	1
Object-aware Inversion and Reassembly for Image Editing	Oct 18, 2023	BenchmarkingDenoising	CodeCode Available	1

Show:10 25 50

← PrevPage 285 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified