SOTAVerified

Benchmarking

Papers

Showing 28212830 of 5548 papers

TitleStatusHype
Evaluating LLP Methods: Challenges and ApproachesCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame CompressionCode0
On General Language Understanding0
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User0
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting0
RDBench: ML Benchmark for Relational Databases0
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair0
XFEVER: Exploring Fact Verification across LanguagesCode0
MLFMF: Data Sets for Machine Learning for Mathematical FormalizationCode1
Show:102550
← PrevPage 283 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified