SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 961–970 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests	May 15, 2025	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions	Jun 8, 2021	Bayesian OptimisationBenchmarking	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
MMDetection: Open MMLab Detection Toolbox and Benchmark	Jun 17, 2019	BenchmarkingInstance Segmentation	CodeCode Available	1	5
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Jul 10, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking End-to-End Behavioural Cloning on Video Games	Apr 2, 2020	Behavioural cloningBenchmarking	CodeCode Available	1	5
Deep Learning-Based Synchronization for Uplink NB-IoT	May 22, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Natural Language Understanding Services for building Conversational Agents	Mar 13, 2019	BenchmarkingGeneral Classification	CodeCode Available	1	5

Show:10 25 50

← PrevPage 97 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified