SOTAVerified

Benchmarking

Papers

Showing 42014225 of 5548 papers

TitleStatusHype
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models0
Precise Model Benchmarking with Only a Few Observations0
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels0
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization0
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions0
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index0
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling0
Predicting the Performance of a Computing System with Deep Networks0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift0
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks0
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos0
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds0
Predictive Models from Quantum Computer Benchmarks0
Auto-tuning TensorFlow Threading Model for CPU Backend0
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection0
Benchmarking Machine Reading Comprehension: A Psychological Perspective0
UCCIX: Irish-eXcellence Large Language Model0
Pretraining boosts out-of-domain robustness for pose estimation0
Who Said That? Benchmarking Social Media AI Detection0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms0
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters0
Privacy-Preserving Language Model Inference with Instance Obfuscation0
Show:102550
← PrevPage 169 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified