SOTAVerified

Benchmarking

Papers

Showing 28812890 of 5548 papers

TitleStatusHype
Risk Aware Benchmarking of Large Language Models0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms0
ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction HorizonsCode2
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep SupervisionCode0
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods0
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language ModelsCode1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachCode1
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets0
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization0
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisCode3
Show:102550
← PrevPage 289 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified