SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2481–2490 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Machine Learning Automation Toolbox (MLaut)	Jan 11, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0	5
Machine Learning Cryptanalysis of a Quantum Random Number Generator	May 7, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0	5
DispaRisk: Auditing Fairness Through Usable Information	May 20, 2024	BenchmarkingBias Detection	CodeCode Available	0	5
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available	0	5
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available	0	5
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available	0	5
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	0	5
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available	0	5
Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark	Jun 30, 2021	BenchmarkingPrediction	CodeCode Available	0	5
Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms	Jul 1, 2022	BenchmarkingClassification	CodeCode Available	0	5

Show:10 25 50

← PrevPage 249 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified