SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1811–1820 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation	Apr 29, 2025	BenchmarkingFairness	CodeCode Available	0	5
Adaptive Power System Emergency Control using Deep Reinforcement Learning	Mar 9, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	0	5
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available	0	5
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective	May 28, 2025	BenchmarkingMemorization	CodeCode Available	0	5
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available	0	5
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions	Jul 28, 2017	Autonomous VehiclesBenchmarking	CodeCode Available	0	5
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available	0	5
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery	Jan 2, 2025	BenchmarkingExperimental Design	CodeCode Available	0	5
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies	Feb 19, 2024	Benchmarking	CodeCode Available	0	5
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available	0	5

Show:10 25 50

← PrevPage 182 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified