SOTAVerified

Benchmarking

Papers

Showing 19611970 of 5548 papers

TitleStatusHype
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
DASB -- Discrete Audio and Speech Benchmark0
Data Analysis in the Era of Generative AI0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Certifying almost all quantum states with few single-qubit measurements0
Show:102550
← PrevPage 197 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified