SOTAVerified

Benchmarking

Papers

Showing 21712180 of 5548 papers

TitleStatusHype
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management0
Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents0
A new pathway to generative artificial intelligence by minimizing the maximum entropy0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions0
Adaptive Gradient Methods with Local Guarantees0
Object Pose Estimation in Robotics Revisited0
BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization0
Scale MLPerf-0.6 models on Google TPU-v3 Pods0
Boundary Detection Benchmarking: Beyond F-Measures0
Show:102550
← PrevPage 218 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified