SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4711–4720 of 5548 papers

Title	Date	Tasks	Status	Hype
Roughness Index and Roughness Distance for Benchmarking Medical Segmentation	Mar 23, 2021	BenchmarkingImage Segmentation	CodeCode Available	0
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky Patterns	Feb 27, 2024	BenchmarkingBinary Classification	CodeCode Available	0
MEDFAIR: Benchmarking Fairness for Medical Imaging	Oct 4, 2022	BenchmarkingFairness	CodeCode Available	0
Benchmarking the Robustness of Optical Flow Estimation to Corruptions	Nov 22, 2024	Autonomous DrivingBenchmarking	CodeCode Available	0
Adaptive Power System Emergency Control using Deep Reinforcement Learning	Mar 9, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	0
Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch	Feb 20, 2022	Benchmarking	CodeCode Available	0
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo	Mar 14, 2019	BenchmarkingOpenAI Gym	CodeCode Available	0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function Set	Apr 28, 2022	Benchmarking	CodeCode Available	0
Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps	Jan 8, 2019	BenchmarkingCPU	CodeCode Available	0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	May 13, 2025	BenchmarkingDiagnostic	CodeCode Available	0

Show:10 25 50

← PrevPage 472 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified