SOTAVerified

Recent advances in the Self-Referencing Embedding Strings (SELFIES) library

2023-02-07Code Available2· sign in to hype

Alston Lo, Robert Pollice, AkshatKumar Nigam, Andrew D. White, Mario Krenn, Alán Aspuru-Guzik

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencIng Embedded Strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of , where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of (version 2.1.1) in this manuscript.

Reproductions