SOTAVerified

Benchmarking

Papers

Showing 121130 of 5548 papers

TitleStatusHype
Solving excited states for long-range interacting trapped ions with neural networks0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech0
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine LearningCode0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding0
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models0
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis0
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
RADAR: Benchmarking Language Models on Imperfect Tabular DataCode1
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
Show:102550
← PrevPage 13 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified