SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1361–1370 of 5548 papers

Title	Date	Tasks	Status	Hype
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology	Jun 30, 2022	BenchmarkingDiagnostic	CodeCode Available	1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets	Jan 29, 2024	BenchmarkingMachine Translation	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 137 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified