SOTAVerified

Benchmarking

Papers

Showing 48814890 of 5548 papers

TitleStatusHype
Unraveling the Capabilities of Language Models in News SummarizationCode0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at ScaleCode0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing TasksCode0
MUBen: Benchmarking the Uncertainty of Molecular Representation ModelsCode0
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event DetectionCode0
WAC: A Corpus of Wikipedia Conversations for Online Abuse DetectionCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine LearningCode0
Feature interpretability in BCIs: exploring the role of network lateralizationCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Show:102550
← PrevPage 489 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified