SOTAVerified

Benchmarking

Papers

Showing 37113720 of 5548 papers

TitleStatusHype
Benchmarking Machine Reading Comprehension: A Psychological Perspective0
Pretraining boosts out-of-domain robustness for pose estimation0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms0
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
Privacy-Preserving Language Model Inference with Instance Obfuscation0
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery0
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide0
ProBench: Benchmarking Large Language Models in Competitive Programming0
Problem-solving benefits of down-sampled lexicase selection0
Show:102550
← PrevPage 372 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified