SOTAVerified

Benchmarking

Papers

Showing 3140 of 5548 papers

TitleStatusHype
State and Memory is All You Need for Robust and Reliable AI Agents0
Point Cloud Compression and Objective Quality Assessment: A Survey0
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation0
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at ScaleCode0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset EvaluationCode0
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and SolutionsCode1
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization0
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans0
Show:102550
← PrevPage 4 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified