Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4201–4225 of 5548 papers

Title	Date	Tasks	Status
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models	Apr 1, 2025	Benchmarking	—Unverified
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels	Aug 30, 2022	Benchmarking	—Unverified
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization	May 15, 2025	BenchmarkingClustering	—Unverified
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions	Jul 30, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index	Nov 28, 2022	Benchmarking	—Unverified
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling	Jun 6, 2021	Benchmarking	—Unverified
Predicting the Performance of a Computing System with Deep Networks	Feb 27, 2023	Benchmarking	—Unverified
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach	Nov 17, 2023	BenchmarkingCollision Avoidance	—Unverified
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks	Mar 30, 2023	Benchmarking	—Unverified
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos	Oct 10, 2018	BenchmarkingVideo Quality Assessment	—Unverified
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds	Aug 10, 2017	Benchmarking	—Unverified
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified
Auto-tuning TensorFlow Threading Model for CPU Backend	Dec 4, 2018	BenchmarkingCPU	—Unverified
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection	Feb 28, 2022	BenchmarkingIntrusion Detection	—Unverified
Benchmarking Machine Reading Comprehension: A Psychological Perspective	Apr 4, 2020	BenchmarkingMachine Reading Comprehension	—Unverified
UCCIX: Irish-eXcellence Large Language Model	May 13, 2024	BenchmarkingLanguage Modeling	—Unverified
Pretraining boosts out-of-domain robustness for pose estimation	Sep 24, 2019	Animal Pose EstimationBenchmarking	—Unverified
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms	Jun 29, 2023	BenchmarkingRobot Navigation	—Unverified
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints	May 12, 2025	Benchmarking	—Unverified
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search	Apr 7, 2025	BenchmarkingCode Generation	—Unverified
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters	May 8, 2025	Benchmarking	—Unverified
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 169 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified