SOTAVerified

Benchmarking

Papers

Showing 36513700 of 5548 papers

TitleStatusHype
Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics0
PerSEval: Assessing Personalization in Text Summarizers0
Personalised Feedback Framework for Online Education Programmes Using Generative AI0
Personalized Multimodal Large Language Models: A Survey0
Personalized On-Device E-health Analytics with Decentralized Block Coordinate Descent0
Person Re-Identification by Unsupervised Video Matching0
Person Re-Identification in Identity Regression Space0
Person Re-identification in the Wild0
Person Search by Multi-Scale Matching0
Person Search by Multi-Scale Matching0
Perspective on recent developments and challenges in regulatory and systems genomics0
Perspectives on the State and Future of Deep Learning -- 20230
Perturbation-based exploration methods in deep reinforcement learning0
PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow0
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions0
PhD Thesis on Code Modulated Interferometric Imaging System using Phased Arrays0
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle0
PhilHumans: Benchmarking Machine Learning for Personal Health0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding0
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models0
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning0
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach0
PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation0
PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs0
Pitfalls of topology-aware image segmentation0
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild0
PKLot-A robust dataset for parking lot classification0
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI0
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment0
Point Cloud Compression and Objective Quality Assessment: A Survey0
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation0
Polarization and Index Modulations: a Theoretical and Practical Perspective0
Policy Entropy for Out-of-Distribution Classification0
Polyp-E: Benchmarking the Robustness of Deep Segmentation Models via Polyp Editing0
Portfolio Benchmarking under Drawdown Constraint and Stochastic Sharpe Ratio0
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions0
Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks0
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation0
Position: Benchmarking is Limited in Reinforcement Learning Research0
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks0
Position: There are no Champions in Long-Term Time Series Forecasting0
Post-FEC BER Benchmarking for Bit-Interleaved Coded Modulation with Probabilistic Shaping0
Post-hoc labeling of arbitrary EEG recordings for data-efficient evaluation of neural decoding methods0
Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions0
PowerGraph: A power grid benchmark dataset for graph neural networks0
Power Line Communication vs. Talkative Power Conversion: A Benchmarking Study0
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding0
Practical, Fast and Robust Point Cloud Registration for 3D Scene Stitching and Object Localization0
Precise Model Benchmarking with Only a Few Observations0
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions0
Show:102550
← PrevPage 74 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified