SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2001–2010 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari	Feb 24, 2018	Atari GamesBenchmarking	CodeCode Available	0	5
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions	Dec 11, 2024	BenchmarkingQuestion Answering	CodeCode Available	0	5
Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets	Oct 23, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0	5
AlphaZip: Neural Network-Enhanced Lossless Text Compression	Sep 23, 2024	BenchmarkingData Compression	CodeCode Available	0	5
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Mar 15, 2024	Benchmarking	CodeCode Available	0	5
Identifying Money Laundering Subgraphs on the Blockchain	Oct 10, 2024	Benchmarking	CodeCode Available	0	5
A Wild Bootstrap for Degenerate Kernel Tests	Aug 23, 2014	BenchmarkingTime Series	CodeCode Available	0	5
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF Infeasible	Jul 10, 2025	Adversarial AttackBenchmarking	CodeCode Available	0	5
Benchmarking YOLOv5 and YOLOv7 models with DeepSORT for droplet tracking applications	Jan 19, 2023	BenchmarkingGPU	CodeCode Available	0	5
Identifying and Benchmarking Natural Out-of-Context Prediction Problems	Oct 25, 2021	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 201 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified