SOTAVerified

SRSD: Rethinking Datasets of Symbolic Regression for Scientific Discovery

2022-12-02NeurIPS 2022 AI for Science: Progress and Promises 2022Code Available1· sign in to hype

Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Symbolic Regression (SR) is a task of recovering mathematical expressions from given data and has been attracting attention from the research community to discuss its potential for scientific discovery. However, the community lacks datasets of symbolic regression for scientific discovery (SRSD) to discuss the potential of SR. To address the critical issue, we revisit datasets of SRSD to discuss the potential of symbolic regression for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of SRSD. For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling ranges of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method can (re)discover physical laws from such datasets. We conduct experiments on our new SRSD datasets using five state-of-the-art SR methods in SRBench, and the results show that the new SRSD datasets are more challenging than the original ones. Our datasets and code repository are publicly available.

Tasks

Reproductions