SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3711–3720 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Machine Reading Comprehension: A Psychological Perspective	Apr 4, 2020	BenchmarkingMachine Reading Comprehension	—Unverified	0
Pretraining boosts out-of-domain robustness for pose estimation	Sep 24, 2019	Animal Pose EstimationBenchmarking	—Unverified	0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms	Jun 29, 2023	BenchmarkingRobot Navigation	—Unverified	0
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints	May 12, 2025	Benchmarking	—Unverified	0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search	Apr 7, 2025	BenchmarkingCode Generation	—Unverified	0
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery	Mar 27, 2019	BenchmarkingObject	—Unverified	0
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide	Feb 20, 2025	Adversarial RobustnessBenchmarking	—Unverified	0
ProBench: Benchmarking Large Language Models in Competitive Programming	Feb 28, 2025	AttributeBenchmarking	—Unverified	0
Problem-solving benefits of down-sampled lexicase selection	Jun 10, 2021	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 372 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified