SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1021–1030 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things	Sep 29, 2023	BenchmarkingFederated Learning	CodeCode Available	1	5
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed	May 27, 2022	BenchmarkingBinary Classification	CodeCode Available	1	5
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension	Mar 26, 2022	BenchmarkingQuestion Answering	CodeCode Available	1	5
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware	Jul 28, 2023	Benchmarkingreinforcement-learning	CodeCode Available	1	5
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses	Mar 3, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1	5
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 103 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified