SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1991–2000 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset	Mar 9, 2023	BenchmarkingDeep Learning	CodeCode Available	0	5
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF Infeasible	Jul 10, 2025	Adversarial AttackBenchmarking	CodeCode Available	0	5
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari	Feb 24, 2018	Atari GamesBenchmarking	CodeCode Available	0	5
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical Systems	Apr 5, 2023	Benchmarking	CodeCode Available	0	5
Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets	Oct 23, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0	5
AlphaZip: Neural Network-Enhanced Lossless Text Compression	Sep 23, 2024	BenchmarkingData Compression	CodeCode Available	0	5
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study	Mar 15, 2024	Benchmarking	CodeCode Available	0	5
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models	Jan 28, 2024	BenchmarkingCode Generation	CodeCode Available	0	5
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available	0	5
Identifying and Benchmarking Natural Out-of-Context Prediction Problems	Oct 25, 2021	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 200 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified