SOTAVerified

Benchmarking

Papers

Showing 14611470 of 5548 papers

TitleStatusHype
Advancing Histopathology with Deep Learning Under Data Scarcity: A Decade in Review0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs0
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor EnvironmentsCode1
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart ProblemsCode1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
UCFE: A User-Centric Financial Expertise Benchmark for Large Language ModelsCode0
Sum Secrecy Rate Maximization for Full Duplex ISAC Systems0
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large pCode0
debiaSAE: Benchmarking and Mitigating Vision-Language Model BiasCode0
Trust but Verify: Programmatic VLM Evaluation in the Wild0
Show:102550
← PrevPage 147 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified