Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3701–3750 of 5548 papers

Title	Date	Tasks	Status
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index	Nov 28, 2022	Benchmarking	—Unverified
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling	Jun 6, 2021	Benchmarking	—Unverified
Predicting the Performance of a Computing System with Deep Networks	Feb 27, 2023	Benchmarking	—Unverified
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach	Nov 17, 2023	BenchmarkingCollision Avoidance	—Unverified
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks	Mar 30, 2023	Benchmarking	—Unverified
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos	Oct 10, 2018	BenchmarkingVideo Quality Assessment	—Unverified
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds	Aug 10, 2017	Benchmarking	—Unverified
Predictive Models from Quantum Computer Benchmarks	May 15, 2023	Benchmarkingimage-classification	—Unverified
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection	Feb 28, 2022	BenchmarkingIntrusion Detection	—Unverified
Benchmarking Machine Reading Comprehension: A Psychological Perspective	Apr 4, 2020	BenchmarkingMachine Reading Comprehension	—Unverified
Pretraining boosts out-of-domain robustness for pose estimation	Sep 24, 2019	Animal Pose EstimationBenchmarking	—Unverified
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms	Jun 29, 2023	BenchmarkingRobot Navigation	—Unverified
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints	May 12, 2025	Benchmarking	—Unverified
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search	Apr 7, 2025	BenchmarkingCode Generation	—Unverified
Privacy-Preserving Language Model Inference with Instance Obfuscation	Feb 13, 2024	BenchmarkingLanguage Modeling	—Unverified
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery	Mar 27, 2019	BenchmarkingObject	—Unverified
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide	Feb 20, 2025	Adversarial RobustnessBenchmarking	—Unverified
ProBench: Benchmarking Large Language Models in Competitive Programming	Feb 28, 2025	AttributeBenchmarking	—Unverified
Problem-solving benefits of down-sampled lexicase selection	Jun 10, 2021	Benchmarking	—Unverified
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning	May 31, 2021	BenchmarkingDeep Learning	—Unverified
Procedural Generalization by Planning with Self-Supervised World Models	Nov 2, 2021	BenchmarkingMeta-Learning	—Unverified
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions	Jul 1, 2024	BenchmarkingQuestion Generation	—Unverified
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified
Progressive Class-level Distillation	May 30, 2025	BenchmarkingKnowledge Distillation	—Unverified
Progressive Multi-view Human Mesh Recovery with Self-Supervision	Dec 10, 2022	BenchmarkingDiversity	—Unverified
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure	Sep 21, 2022	BenchmarkingImage Inpainting	—Unverified
Projective simulation applied to the grid-world and the mountain-car problem	May 21, 2014	Benchmarkingreinforcement-learning	—Unverified
Project MPG: towards a generalized performance benchmark for LLM capabilities	Oct 28, 2024	BenchmarkingChatbot	—Unverified
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study	Jan 25, 2025	Benchmarking	—Unverified
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified
Prompt Sketching for Large Language Models	Nov 8, 2023	Arithmetic ReasoningBenchmarking	—Unverified
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet	Apr 2, 2025	Benchmarking	—Unverified
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis	Jan 21, 2021	Benchmarking	—Unverified
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking	May 13, 2022	Benchmarkingreinforcement-learning	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice	Feb 28, 2025	BenchmarkingDiagnostic	—Unverified
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jan 3, 2025	Benchmarking	—Unverified
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing	Jun 27, 2023	Benchmarking	—Unverified
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Dec 16, 2024	BenchmarkingImage Captioning	—Unverified
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models	Dec 30, 2023	Benchmarkingimage-classification	—Unverified
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A	Jun 1, 2015	BenchmarkingFace Detection	—Unverified
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy	Mar 18, 2021	Art AnalysisBenchmarking	—Unverified
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case	Jun 16, 2022	BenchmarkingDensity Estimation	—Unverified
Python Random Graph Generator	Sep 20, 2017	BenchmarkingGraph Generation	—Unverified

Show:10 25 50

← PrevPage 75 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified