SOTAVerified

Benchmarking

Papers

Showing 42014250 of 5548 papers

TitleStatusHype
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models0
Precise Model Benchmarking with Only a Few Observations0
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels0
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization0
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions0
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index0
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling0
Predicting the Performance of a Computing System with Deep Networks0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift0
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks0
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos0
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds0
Predictive Models from Quantum Computer Benchmarks0
Auto-tuning TensorFlow Threading Model for CPU Backend0
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection0
Benchmarking Machine Reading Comprehension: A Psychological Perspective0
UCCIX: Irish-eXcellence Large Language Model0
Pretraining boosts out-of-domain robustness for pose estimation0
Who Said That? Benchmarking Social Media AI Detection0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms0
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters0
Privacy-Preserving Language Model Inference with Instance Obfuscation0
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery0
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs0
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide0
ProBench: Benchmarking Large Language Models in Competitive Programming0
UCLID-Net: Single View Reconstruction in Object Space0
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite0
A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms0
Automatic vehicle trajectory data reconstruction at scale0
Problem-solving benefits of down-sampled lexicase selection0
Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey0
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning0
Procedural Generalization by Planning with Self-Supervised World Models0
UGSL: A Unified Framework for Benchmarking Graph Structure Learning0
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Progressive Class-level Distillation0
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure0
Projective simulation applied to the grid-world and the mountain-car problem0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study0
Prompting Scientific Names for Zero-Shot Species Recognition0
Automatic Microprocessor Performance Bug Detection0
Prompt Sketching for Large Language Models0
Show:102550
← PrevPage 85 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified