SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1411–1420 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms	Jul 8, 2021	Benchmarking	CodeCode Available	1	5
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning	Dec 11, 2023	BenchmarkingHuman-Object Interaction Detection	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Introducing Milabench: Benchmarking Accelerators for AI	Nov 18, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search	Nov 24, 2021	BenchmarkingNeural Architecture Search	CodeCode Available	1	5
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method	May 22, 2023	BenchmarkingHallucination	CodeCode Available	1	5
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 142 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified