SOTAVerified

Unconditional Molecule Generation

This task evaluates the ability of generative models to sample valid and realistic molecular structures.

The training dataset can be:

QM9 (Wu et al., 2018) - consists of 130,000 stable small organic molecules containing up to nine heavy atoms (C, N, O, F) along with hydrogens.
GEOM-DRUGS (Axelrod and Gómez-Bombarelli, 2022) - consistes of 430,000 large organic molecules of up to 180 atoms.

Following prior work (Hoogeboom et al., 2022), we generally sample 10,000 molecules and compute validity, uniqueness and Posebusters sanity checks (Buttenschoen et al., 2024) for molecules. Data is generally split following prior work (Hoogeboom et al., 2022, Vignac et al., 2023) to ensure fair comparisons.

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TABASCO	PoseBusters Validity	92	—	Unverified
2	SemlaFlow	PoseBusters Validity	87.5	—	Unverified
3	ADiT	PoseBusters Validity	85.3	—	Unverified
4	MiDi	Validity	77.8	—	Unverified
5	EQGAT-diff	PoseBusters Validity	59.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ADiT	Validity	94.45	—	Unverified
2	GeoLDM	Validity	93.8	—	Unverified
3	EDM	Validity	91.9	—	Unverified
4	Symphony	Validity	83.5	—	Unverified

Unconditional Molecule Generation

Papers

Benchmark Results