SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2731–2740 of 5548 papers

Title	Date	Tasks	Status	Hype
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation	Jun 24, 2020	BenchmarkingData Augmentation	—Unverified	0
General Scales Unlock AI Evaluation with Explanatory and Predictive Power	Mar 9, 2025	BenchmarkingSpecificity	—Unverified	0
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design	Apr 14, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey	Jun 5, 2020	BenchmarkingExperimental Design	—Unverified	0
A Survey of Parameters Associated with the Quality of Benchmarks in NLP	Oct 14, 2022	Benchmarking	—Unverified	0
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems	Nov 27, 2024	AutoMLBenchmarking	—Unverified	0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified	0
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking	Nov 6, 2024	Benchmarking	—Unverified	0
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow	Dec 18, 2024	Benchmarking	—Unverified	0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition	Mar 24, 2025	BenchmarkingFood Recognition	—Unverified	0

Show:10 25 50

← PrevPage 274 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified