Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4201–4250 of 5548 papers

Title	Date	Tasks	Status
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models	Apr 1, 2025	Benchmarking	—Unverified
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified
AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels	Aug 30, 2022	Benchmarking	—Unverified
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization	May 15, 2025	BenchmarkingClustering	—Unverified
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions	Jul 30, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index	Nov 28, 2022	Benchmarking	—Unverified
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling	Jun 6, 2021	Benchmarking	—Unverified
Predicting the Performance of a Computing System with Deep Networks	Feb 27, 2023	Benchmarking	—Unverified
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach	Nov 17, 2023	BenchmarkingCollision Avoidance	—Unverified
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks	Mar 30, 2023	Benchmarking	—Unverified
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos	Oct 10, 2018	BenchmarkingVideo Quality Assessment	—Unverified
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds	Aug 10, 2017	Benchmarking	—Unverified
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified
Auto-tuning TensorFlow Threading Model for CPU Backend	Dec 4, 2018	BenchmarkingCPU	—Unverified
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection	Feb 28, 2022	BenchmarkingIntrusion Detection	—Unverified
Benchmarking Machine Reading Comprehension: A Psychological Perspective	Apr 4, 2020	BenchmarkingMachine Reading Comprehension	—Unverified
UCCIX: Irish-eXcellence Large Language Model	May 13, 2024	BenchmarkingLanguage Modeling	—Unverified
Pretraining boosts out-of-domain robustness for pose estimation	Sep 24, 2019	Animal Pose EstimationBenchmarking	—Unverified
Who Said That? Benchmarking Social Media AI Detection	Oct 12, 2023	BenchmarkingMisinformation	—Unverified
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms	Jun 29, 2023	BenchmarkingRobot Navigation	—Unverified
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints	May 12, 2025	Benchmarking	—Unverified
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search	Apr 7, 2025	BenchmarkingCode Generation	—Unverified
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters	May 8, 2025	Benchmarking	—Unverified
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery	Mar 27, 2019	BenchmarkingObject	—Unverified
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs	May 10, 2024	BenchmarkingHyperparameter Optimization	—Unverified
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide	Feb 20, 2025	Adversarial RobustnessBenchmarking	—Unverified
ProBench: Benchmarking Large Language Models in Competitive Programming	Feb 28, 2025	AttributeBenchmarking	—Unverified
UCLID-Net: Single View Reconstruction in Object Space	Jun 6, 2020	BenchmarkingDecoder	—Unverified
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite	Apr 18, 2023	BenchmarkingInstance Segmentation	—Unverified
A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms	Dec 1, 2015	BenchmarkingImage Generation	—Unverified
Automatic vehicle trajectory data reconstruction at scale	Dec 15, 2022	Benchmarkingvehicle detection	—Unverified
Problem-solving benefits of down-sampled lexicase selection	Jun 10, 2021	Benchmarking	—Unverified
Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey	Jul 4, 2020	BenchmarkingSurvey	—Unverified
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning	May 31, 2021	BenchmarkingDeep Learning	—Unverified
Procedural Generalization by Planning with Self-Supervised World Models	Nov 2, 2021	BenchmarkingMeta-Learning	—Unverified
UGSL: A Unified Framework for Benchmarking Graph Structure Learning	Aug 21, 2023	BenchmarkingGraph structure learning	—Unverified
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions	Jul 1, 2024	BenchmarkingQuestion Generation	—Unverified
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified
Progressive Class-level Distillation	May 30, 2025	BenchmarkingKnowledge Distillation	—Unverified
Progressive Multi-view Human Mesh Recovery with Self-Supervision	Dec 10, 2022	BenchmarkingDiversity	—Unverified
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure	Sep 21, 2022	BenchmarkingImage Inpainting	—Unverified
Projective simulation applied to the grid-world and the mountain-car problem	May 21, 2014	Benchmarkingreinforcement-learning	—Unverified
Project MPG: towards a generalized performance benchmark for LLM capabilities	Oct 28, 2024	BenchmarkingChatbot	—Unverified
Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives	Feb 9, 2018	BenchmarkingImage Segmentation	—Unverified
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study	Jan 25, 2025	Benchmarking	—Unverified
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified
Automatic Microprocessor Performance Bug Detection	Nov 17, 2020	Benchmarking	—Unverified
Prompt Sketching for Large Language Models	Nov 8, 2023	Arithmetic ReasoningBenchmarking	—Unverified

Show:10 25 50

← PrevPage 85 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified