Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3651–3700 of 5548 papers

Title	Date	Tasks	Status
Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics	Nov 15, 2022	Benchmarking	—Unverified
PerSEval: Assessing Personalization in Text Summarizers	Jun 29, 2024	BenchmarkingHuman Judgment Correlation	—Unverified
Personalised Feedback Framework for Online Education Programmes Using Generative AI	Oct 14, 2024	BenchmarkingManagement	—Unverified
Personalized Multimodal Large Language Models: A Survey	Dec 3, 2024	BenchmarkingSurvey	—Unverified
Personalized On-Device E-health Analytics with Decentralized Block Coordinate Descent	Dec 17, 2021	BenchmarkingDiagnostic	—Unverified
Person Re-Identification by Unsupervised Video Matching	Nov 25, 2016	BenchmarkingDynamic Time Warping	—Unverified
Person Re-Identification in Identity Regression Space	Jun 25, 2018	BenchmarkingIncremental Learning	—Unverified
Person Re-identification in the Wild	Apr 9, 2016	BenchmarkingPedestrian Detection	—Unverified
Person Search by Multi-Scale Matching	Jul 23, 2018	BenchmarkingHuman Detection	—Unverified
Person Search by Multi-Scale Matching	Sep 1, 2018	BenchmarkingHuman Detection	—Unverified
Perspective on recent developments and challenges in regulatory and systems genomics	Nov 7, 2024	Benchmarking	—Unverified
Perspectives on the State and Future of Deep Learning -- 2023	Dec 7, 2023	BenchmarkingDeep Learning	—Unverified
Perturbation-based exploration methods in deep reinforcement learning	Nov 10, 2020	Atari GamesBenchmarking	—Unverified
PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow	May 28, 2025	Benchmarking	—Unverified
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions	Jun 17, 2025	Benchmarking	—Unverified
PhD Thesis on Code Modulated Interferometric Imaging System using Phased Arrays	Jul 19, 2021	Benchmarking	—Unverified
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle	Jul 18, 2024	BenchmarkingLanguage Modeling	—Unverified
PhilHumans: Benchmarking Machine Learning for Personal Health	May 4, 2024	Action AnticipationBenchmarking	—Unverified
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	Jan 27, 2025	BenchmarkingCommon Sense Reasoning	—Unverified
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models	May 30, 2025	Benchmarking	—Unverified
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning	May 5, 2025	Benchmarking	—Unverified
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach	May 3, 2025	BenchmarkingImage-to-Image Translation	—Unverified
PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation	Jul 22, 2022	BenchmarkingDomain Adaptation	—Unverified
PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs	Jun 24, 2024	BenchmarkingMachine Unlearning	—Unverified
Pitfalls of topology-aware image segmentation	Dec 19, 2024	BenchmarkingImage Segmentation	—Unverified
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild	Apr 16, 2025	Benchmarkingobject-detection	—Unverified
PKLot-A robust dataset for parking lot classification	Jul 1, 2015	BenchmarkingClassification	—Unverified
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI	May 19, 2025	BenchmarkingMinecraft	—Unverified
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment	Feb 17, 2025	BenchmarkingCommon Sense Reasoning	—Unverified
Point Cloud Compression and Objective Quality Assessment: A Survey	Jun 28, 2025	Autonomous DrivingBenchmarking	—Unverified
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation	Apr 4, 2025	AttributeBenchmarking	—Unverified
Polarization and Index Modulations: a Theoretical and Practical Perspective	Mar 20, 2018	BenchmarkingNavigate	—Unverified
Policy Entropy for Out-of-Distribution Classification	May 25, 2020	BenchmarkingClassification	—Unverified
Polyp-E: Benchmarking the Robustness of Deep Segmentation Models via Polyp Editing	Oct 22, 2024	AttributeBenchmarking	—Unverified
Portfolio Benchmarking under Drawdown Constraint and Stochastic Sharpe Ratio	Oct 26, 2016	Benchmarking	—Unverified
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions	Jun 20, 2024	Animal Pose EstimationAutonomous Driving	—Unverified
Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks	Sep 19, 2018	BenchmarkingImage Generation	—Unverified
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation	May 1, 2025	BenchmarkingPosition	—Unverified
Position: Benchmarking is Limited in Reinforcement Learning Research	Jun 23, 2024	BenchmarkingPosition	—Unverified
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks	Feb 20, 2025	BenchmarkingCombinatorial Optimization	—Unverified
Position: There are no Champions in Long-Term Time Series Forecasting	Feb 19, 2025	BenchmarkingPosition	—Unverified
Post-FEC BER Benchmarking for Bit-Interleaved Coded Modulation with Probabilistic Shaping	Apr 24, 2020	Benchmarking	—Unverified
Post-hoc labeling of arbitrary EEG recordings for data-efficient evaluation of neural decoding methods	Nov 22, 2017	BenchmarkingEEG	—Unverified
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions	Aug 15, 2023	BenchmarkingComputational Efficiency	—Unverified
PowerGraph: A power grid benchmark dataset for graph neural networks	Feb 5, 2024	ArticlesBenchmarking	—Unverified
Power Line Communication vs. Talkative Power Conversion: A Benchmarking Study	Apr 16, 2025	Benchmarking	—Unverified
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	Jan 7, 2025	BenchmarkingCode Generation	—Unverified
Practical, Fast and Robust Point Cloud Registration for 3D Scene Stitching and Object Localization	Nov 8, 2021	3D Feature MatchingBenchmarking	—Unverified
Precise Model Benchmarking with Only a Few Observations	Oct 7, 2024	Benchmarkingmodel	—Unverified
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions	Jul 30, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified

Show:10 25 50

← PrevPage 74 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified