SOTAVerified

Benchmarking

Papers

Showing 16511660 of 5548 papers

TitleStatusHype
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
Benchmarking Large Language Model Uncertainty for Prompt OptimizationCode0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data0
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study0
Text-To-Speech Synthesis In The Wild0
ODAQ: Open Dataset of Audio Quality - Benchmark on GitHubCode1
Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning0
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraintsCode0
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots0
Show:102550
← PrevPage 166 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified