SOTAVerified

Benchmarking

Papers

Showing 24612470 of 5548 papers

TitleStatusHype
A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient VoiceCode0
Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client AvailabilityCode0
Generalization and Regularization in DQNCode0
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AICode0
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory InstructionsCode0
Benchmarking Large Language Models for Molecule Prediction TasksCode0
DispBench: Benchmarking Disparity Estimation to Synthetic CorruptionsCode0
Are Large Language Models Good at Utility Judgments?Code0
Expecting The Unexpected: Towards Broad Out-Of-Distribution DetectionCode0
DispaRisk: Auditing Fairness Through Usable InformationCode0
Show:102550
← PrevPage 247 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified