SOTAVerified

Nonparametric Reaction Coordinate Optimization with Histories: A Framework for Rare Event Dynamics

2026-03-03Unverified0· sign in to hype

Polina V. Banushkina, Sergei V. Krivov

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Rare but critical events in complex systems, such as protein folding, chemical reactions, disease progression, and extreme weather or climate phenomena, are governed by complex, high-dimensional, stochastic dynamics. Identifying an optimal reaction coordinate (RC) that accurately captures the progress of these dynamics is crucial for understanding and simulating such processes. However, determining an optimal RC for realistic systems is notoriously difficult, due to methodological challenges that limit the success of standard machine learning techniques. These challenges include the absence of ground truth, the lack of a loss function for general nonequilibrium dynamics, the difficulty of selecting expressive neural network architectures that avoid overfitting, the irregular and incomplete nature of many real world trajectories, limited sampling and the extreme data imbalance inherent in rare event problems. Here, we introduce a nonparametric RC optimization framework that incorporates trajectory histories and circumvents these challenges, enabling robust analysis of irregular or incomplete data without requiring extensive sampling. The power of the method is demonstrated through increasingly challenging analyses of protein folding dynamics, where it yields accurate committor estimates that pass stringent validation tests and produce high resolution free energy profiles. Its generality is further illustrated through applications to phase space dynamics, a conceptual ocean circulation model, and a longitudinal clinical dataset. These results demonstrate that rare event dynamics can be accurately characterized without extensive sampling of the configuration space, establishing a general, flexible, and robust framework for analyzing complex dynamical systems and longitudinal datasets.

Reproductions