SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 871–880 of 5548 papers

Title	Date	Tasks	Status	Hype
In Search of Lost Online Test-time Adaptation: A Survey	Oct 31, 2023	BenchmarkingGPU	CodeCode Available	1
Re-evaluating Retrosynthesis Algorithms with Syntheseus	Oct 30, 2023	BenchmarkingMulti-step retrosynthesis	CodeCode Available	1
MLFMF: Data Sets for Machine Learning for Mathematical Formalization	Oct 24, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks	Oct 23, 2023	Benchmarking	CodeCode Available	1
Fast hyperboloid decision tree algorithms	Oct 20, 2023	BenchmarkingRiemannian optimization	CodeCode Available	1
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark	Oct 20, 2023	Benchmarkingde-en	CodeCode Available	1
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift	Oct 19, 2023	Adversarial RobustnessBenchmarking	CodeCode Available	1
Object-aware Inversion and Reassembly for Image Editing	Oct 18, 2023	BenchmarkingDenoising	CodeCode Available	1
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now	Oct 18, 2023	Adversarial Robustness	CodeCode Available	1
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection	Oct 18, 2023	BenchmarkingHallucination	CodeCode Available	1

Show:10 25 50

← PrevPage 88 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified