SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2331–2340 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking ChatGPT on Algorithmic Reasoning	Apr 4, 2024	Benchmarking	CodeCode Available	0
Schroedinger's Threshold: When the AUC doesn't predict Accuracy	Apr 4, 2024	Benchmarking	CodeCode Available	0
Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization	Apr 4, 2024	Benchmarking	CodeCode Available	0
A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off	Apr 4, 2024	Benchmarking	CodeCode Available	0
DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior	Apr 4, 2024	BenchmarkingImage Restoration	—Unverified	0
NL2KQL: From Natural Language to Kusto Query	Apr 3, 2024	BenchmarkingNatural Language Queries	—Unverified	0
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
On the reduction of Linear Parameter-Varying State-Space models	Apr 2, 2024	BenchmarkingDimensionality Reduction	—Unverified	0
Atom-Level Optical Chemical Structure Recognition with Limited Supervision	Apr 2, 2024	Benchmarking	CodeCode Available	1
PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics	Apr 2, 2024	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 234 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified