SOTAVerified

Benchmarking

Papers

Showing 631640 of 5548 papers

TitleStatusHype
Cross-functional transferability in universal machine learning interatomic potentials0
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search0
A Solid-State Nanopore Signal Generator for Training Machine Learning Models0
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIsCode0
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
A Survey of Pathology Foundation Model: Progress and Future DirectionsCode1
Do LLM Evaluators Prefer Themselves for a Reason?Code0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models0
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical SystemsCode0
Show:102550
← PrevPage 64 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified