SOTAVerified

Benchmarking

Papers

Showing 37263750 of 5548 papers

TitleStatusHype
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure0
Projective simulation applied to the grid-world and the mountain-car problem0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study0
Prompting Scientific Names for Zero-Shot Species Recognition0
Prompt Sketching for Large Language Models0
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms0
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation0
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing0
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension0
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A0
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
Python Random Graph Generator0
Show:102550
← PrevPage 150 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified