SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 271–280 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks	Oct 30, 2023	Benchmarkingobject-detection	CodeCode Available	2	5
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)	Jan 14, 2023	Benchmarking	CodeCode Available	2	5
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2	5
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer	Mar 21, 2025	BenchmarkingVideo Generation	CodeCode Available	2	5
Datasets and Benchmarks for Offline Safe Reinforcement Learning	Jun 15, 2023	Autonomous DrivingBenchmarking	CodeCode Available	2	5
BARS: Towards Open Benchmarking for Recommender Systems	May 19, 2022	BenchmarkingClick-Through Rate Prediction	CodeCode Available	2	5
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs	Jun 13, 2024	BenchmarkingGPU	CodeCode Available	2	5
DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation	Jun 22, 2022	BenchmarkingRecommendation Systems	CodeCode Available	2	5
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation	Jun 24, 2024	BenchmarkingImage Generation	CodeCode Available	2	5
Craftium: An Extensible Framework for Creating Reinforcement Learning Environments	Jul 4, 2024	BenchmarkingMinecraft	CodeCode Available	2	5

Show:10 25 50

← PrevPage 28 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified