SOTAVerified

Benchmarking

Papers

Showing 28012810 of 5548 papers

TitleStatusHype
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
DASB -- Discrete Audio and Speech Benchmark0
Data Analysis in the Era of Generative AI0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Data Augmentation for Traffic Classification0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Data-driven Approach for Static Hedging of Exchange Traded Options0
Show:102550
← PrevPage 281 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified