SOTAVerified

Benchmarking

Papers

Showing 37013750 of 5548 papers

TitleStatusHype
Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index0
Predicting Quantum Potentials by Deep Neural Network and Metropolis Sampling0
Predicting the Performance of a Computing System with Deep Networks0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift0
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks0
Prediction of the Influence of Navigation Scan-path on Perceived Quality of Free-Viewpoint Videos0
Predictive modelling of a novel anti-adhesion therapy to combat bacterial colonisation of burn wounds0
Predictive Models from Quantum Computer Benchmarks0
Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection0
Benchmarking Machine Reading Comprehension: A Psychological Perspective0
Pretraining boosts out-of-domain robustness for pose estimation0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms0
PRISM: Complete Online Decentralized Multi-Agent Pathfinding with Rapid Information Sharing using Motion Constraints0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
Privacy-Preserving Language Model Inference with Instance Obfuscation0
Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery0
Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide0
ProBench: Benchmarking Large Language Models in Competitive Programming0
Problem-solving benefits of down-sampled lexicase selection0
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning0
Procedural Generalization by Planning with Self-Supervised World Models0
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Progressive Class-level Distillation0
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure0
Projective simulation applied to the grid-world and the mountain-car problem0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study0
Prompting Scientific Names for Zero-Shot Species Recognition0
Prompt Sketching for Large Language Models0
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms0
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation0
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing0
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension0
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A0
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
Python Random Graph Generator0
Show:102550
← PrevPage 75 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified