SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1251–1260 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
HINT3: Raising the bar for Intent Detection in the Wild	Sep 29, 2020	BenchmarkingIntent Detection	CodeCode Available	1	5
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding	Nov 6, 2023	BenchmarkingData Compression	CodeCode Available	1	5
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1	5
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1	5
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1	5
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1	5
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks	Nov 25, 2024	Benchmarkingobject-detection	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 126 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified