SOTAVerified

Benchmarking

Papers

Showing 32213230 of 5548 papers

TitleStatusHype
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs0
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution0
Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NASCode0
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering0
Towards Objectively Benchmarking Social Intelligence for Language Agents at Action LevelCode0
HOEG: A New Approach for Object-Centric Predictive Process MonitoringCode0
EFSA: Towards Event-Level Financial Sentiment AnalysisCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes0
Show:102550
← PrevPage 323 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified