SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 5548 papers

Title	Date	Tasks	Status	Hype
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning	Jan 25, 2025	BenchmarkingEvolutionary Algorithms	CodeCode Available	7
TaskBench: Benchmarking Large Language Models for Task Automation	Nov 30, 2023	BenchmarkingParameter Prediction	CodeCode Available	6
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X	Mar 30, 2023	BenchmarkingCode Generation	CodeCode Available	5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance	Jun 4, 2025	BenchmarkingScheduling	CodeCode Available	5
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods	Mar 29, 2024	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	5
The BrowserGym Ecosystem for Web Agent Research	Dec 6, 2024	Benchmarking	CodeCode Available	5
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation	Jan 16, 2025	Benchmarking	CodeCode Available	5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations	Dec 10, 2024	AttributeBenchmarking	CodeCode Available	5
Benchmarking the Myopic Trap: Positional Bias in Information Retrieval	May 20, 2025	BenchmarkingInformation Retrieval	CodeCode Available	5
Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions	Jan 7, 2024	BenchmarkingImage Segmentation	CodeCode Available	5

Show:10 25 50

← PrevPage 2 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified