SOTAVerified

Benchmarking

Papers

Showing 30513060 of 5548 papers

TitleStatusHype
NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics0
GANmut: Generating and Modifying Facial Expressions0
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results0
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation ModelsCode0
Beyond Slow Signs in High-fidelity Model ExtractionCode0
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate DisclosuresCode0
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic GradingCode0
On the Evaluation of Speech Foundation Models for Spoken Language Understanding0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework0
Show:102550
← PrevPage 306 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified