SOTAVerified

Benchmarking

Papers

Showing 14911500 of 5548 papers

TitleStatusHype
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
MIRFLEX: Music Information Retrieval Feature Library for ExtractionCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular dockingCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity AwarenessCode0
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Show:102550
← PrevPage 150 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified