SOTAVerified

Benchmarking

Papers

Showing 491500 of 5548 papers

TitleStatusHype
DACBench: A Benchmark Library for Dynamic Algorithm ConfigurationCode1
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation AlgorithmsCode1
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMsCode1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMsCode1
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark FrameworkCode1
A Platform for the Biomedical Application of Large Language ModelsCode1
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language ModelsCode1
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
Show:102550
← PrevPage 50 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified