SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 921–930 of 5548 papers

Title	Date	Tasks	Status	Hype
Efficient Lifelong Model Evaluation in an Era of Rapid Progress	Feb 29, 2024	BenchmarkingGPU	CodeCode Available	1
MC-Blur: A Comprehensive Benchmark for Image Deblurring	Dec 1, 2021	BenchmarkingDeblurring	CodeCode Available	1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19	Feb 9, 2021	BenchmarkingQ-Learning	CodeCode Available	1
Benchmarking deep inverse models over time, and the neural-adjoint method	Sep 27, 2020	Benchmarking	CodeCode Available	1
A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification	Nov 28, 2022	Benchmarkingimage-classification	CodeCode Available	1
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1

Show:10 25 50

← PrevPage 93 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified