SOTAVerified

Benchmarking

Papers

Showing 17411750 of 5548 papers

TitleStatusHype
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAMCode0
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual NavigationCode0
Can a single neuron learn predictive uncertainty?Code0
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence ReasoningCode0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
Analyzing the Feature Extractor Networks for Face Image SynthesisCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified ModelCode0
Show:102550
← PrevPage 175 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified