SOTAVerified

Benchmarking

Papers

Showing 31313140 of 5548 papers

TitleStatusHype
A Synthetic Benchmarking Pipeline to Compare Camera Calibration Algorithms0
Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity0
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency0
InstructEval: Systematic Evaluation of Instruction Selection Methods0
Learning Environment Models with Continuous Stochastic Dynamics0
Benchmarking Large Language Model Capabilities for Conditional Generation0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms0
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors0
Uncovering the Limits of Machine Learning for Automatic Vulnerability DetectionCode1
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity0
Show:102550
← PrevPage 314 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified