Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Miles Cranmer
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/MilesCranmer/PySROfficialIn paperpytorch★ 3,446
- github.com/milescranmer/symbolicregression.jlOfficialIn papernone★ 774
- github.com/milescranmer/pysr_paperOfficialIn papernone★ 66
- github.com/hftsoi/symbolfitnone★ 64
Abstract
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.