PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

2020-11-30Code Available1· sign in to hype

Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan Fu, Jason H. Moore

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/EpistasisLab/pmlb
OfficialIn papernone★ 857
github.com/EpistasisLab/pmlbr
OfficialIn papernone★ 10
github.com/EpistasisLab/pmlb-manuscript
OfficialIn papernone★ 0

Abstract

Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.

Tasks

Benchmarking BIG-bench Machine Learning Multi-class Classification

PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

Code

Abstract

Tasks

Reproductions