SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 5548 papers

Title	Date	Tasks	Status	Hype
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X	Mar 30, 2023	BenchmarkingCode Generation	CodeCode Available	5
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics	Jun 14, 2025	Benchmarking	CodeCode Available	4
TerraTorch: The Geospatial Foundation Models Toolkit	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	4
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Mar 20, 2025	BenchmarkingReinforcement Learning (RL)	CodeCode Available	4
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation	Feb 23, 2025	Benchmarking	CodeCode Available	4
Building reliable sim driving agents by scaling self-play	Feb 20, 2025	Autonomous VehiclesBenchmarking	CodeCode Available	4
A deep learning framework for efficient pathology image analysis	Feb 18, 2025	BenchmarkingDeep Learning	CodeCode Available	4
Accelerating Data Processing and Benchmarking of AI Models for Pathology	Feb 10, 2025	Benchmarking	CodeCode Available	4
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound	Feb 7, 2025	Benchmarking	CodeCode Available	4
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation	Feb 4, 2025	BenchmarkingInformation Retrieval	CodeCode Available	4

Show:10 25 50

← PrevPage 3 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified