SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5031–5040 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Model-Based Reinforcement Learning	Jul 3, 2019	Benchmarkingmodel	CodeCode Available	0
Benchmarking Misuse Mitigation Against Covert Adversaries	Jun 6, 2025	BenchmarkingLanguage Modeling	CodeCode Available	0
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo	Mar 30, 2022	BenchmarkingPerson-centric Visual Grounding	CodeCode Available	0
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods	Dec 3, 2024	Benchmarking	CodeCode Available	0
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets	Feb 4, 2025	AllBenchmarking	CodeCode Available	0
To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo	May 1, 2022	BenchmarkingPerson-centric Visual Grounding	CodeCode Available	0
AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep Learning	Aug 3, 2022	Benchmarking	CodeCode Available	0
AKFruitYield: Modular benchmarking and video analysis software for Azure Kinect cameras for fruit size and fruit yield estimation in apple orchards	Oct 6, 2023	Benchmarking	CodeCode Available	0
ShuffleMix: Improving Representations via Channel-Wise Shuffle of Interpolated Hidden States	May 30, 2023	BenchmarkingData Augmentation	CodeCode Available	0
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark	Apr 10, 2025	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 504 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified