SOTAVerified

Benchmarking

Papers

Showing 27312740 of 5548 papers

TitleStatusHype
DarkBench: Benchmarking Dark Patterns in Large Language Models0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
AnyTOD: A Programmable Task-Oriented Dialog System0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
DACOS-A Manually Annotated Dataset of Code Smells0
Benchmarking Explanatory Models for Inertia Forecasting using Public Data of the Nordic Area0
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)0
Adversarially Training for Audio Classifiers0
Show:102550
← PrevPage 274 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified