SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4541–4550 of 5548 papers

Title	Date	Tasks	Status	Hype
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science	Feb 23, 2025	BenchmarkingCode Generation	CodeCode Available	0
Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs	Jul 6, 2024	BenchmarkingDataset Generation	CodeCode Available	0
On-orbit model training for satellite imagery with label proportions	Jun 21, 2023	BenchmarkingEarth Observation	CodeCode Available	0
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping	Feb 27, 2025	Benchmarking	CodeCode Available	0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available	0
Rethinking the Reference-based Distinctive Image Captioning	Jul 22, 2022	AttributeBenchmarking	CodeCode Available	0
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available	0
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available	0
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery	Jan 2, 2025	BenchmarkingExperimental Design	CodeCode Available	0
BONES: a Benchmark fOr Neural Estimation of Shapley values	Jul 23, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 455 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified