SOTAVerified

Benchmarking

Papers

Showing 15511560 of 5548 papers

TitleStatusHype
Statistical Multicriteria Evaluation of LLM-Generated TextCode0
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking0
Universal Music Representations? Evaluating Foundation Models on World Music CorporaCode0
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents0
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors0
Finance Language Model Evaluation (FLaME)0
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Show:102550
← PrevPage 156 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified