SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1861–1870 of 5548 papers

Title	Date	Tasks	Status	Hype
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities	Jul 16, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction	Jul 15, 2024	Active LearningBenchmarking	—Unverified	0
Separable Operator Networks	Jul 15, 2024	BenchmarkingGPU	CodeCode Available	1
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1
AstroMLab 1: Who Wins Astronomy Jeopardy!?	Jul 15, 2024	AstronomyBenchmarking	—Unverified	0
Benchmarking Vision Language Models for Cultural Understanding	Jul 15, 2024	BenchmarkingQuestion Answering	—Unverified	0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation	Jul 15, 2024	Benchmarking	—Unverified	0
When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark	Jul 15, 2024	BenchmarkingGraph Learning	CodeCode Available	1
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control	Jul 14, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 187 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified