SOTAVerified

Benchmarking

Papers

Showing 7180 of 5548 papers

TitleStatusHype
Finance Language Model Evaluation (FLaME)0
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery0
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions0
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World AnomaliesCode1
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning0
Egocentric Human-Object Interaction Detection: A New Benchmark and Method0
Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis0
The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor ProductsCode1
Show:102550
← PrevPage 8 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified