CARE: a Benchmark Suite for the Classification and Retrieval of Enzymes

2024-06-21Code Available1· sign in to hype

Jason Yang, Ariane Mora, Shengchao Liu, Bruce J. Wittmann, Anima Anandkumar, Frances H. Arnold, Yisong Yue

Code Available — Be the first to reproduce this paper.

Code

github.com/jsunn-y/care
OfficialIn paperpytorch★ 43

Abstract

Enzymes are important proteins that catalyze chemical reactions. In recent years, machine learning methods have emerged to predict enzyme function from sequence; however, there are no standardized benchmarks to evaluate these methods. We introduce CARE, a benchmark and dataset suite for the Classification And Retrieval of Enzymes (CARE). CARE centers on two tasks: (1) classification of a protein sequence by its enzyme commission (EC) number and (2) retrieval of an EC number given a chemical reaction. For each task, we design train-test splits to evaluate different kinds of out-of-distribution generalization that are relevant to real use cases. For the classification task, we provide baselines for state-of-the-art methods. Because the retrieval task has not been previously formalized, we propose a method called Contrastive Reaction-EnzymE Pretraining (CREEP) as one of the first baselines for this task and compare it to the recent method, CLIPZyme. CARE is available at https://github.com/jsunn-y/CARE/.

Tasks

Classification Out-of-Distribution Generalization Retrieval

CARE: a Benchmark Suite for the Classification and Retrieval of Enzymes

Code

Abstract

Tasks

Reproductions